Generally, a cache is a high-speed buffer memory that is designed to improve the performance of a central processing unit (CPU), and performs data exchange with a primary memory by blocks. Currently, the cache is not only used for buffering local application data or instructions of the CPU, but also used for buffering various data packets in a network to improve the efficiency of the CPU in processing various data packets.
Now, with the gradual increase of user demands, the applications of multi-core communication processors become increasingly extensive. A multi-core communication processor has multiple CPUs and multiple network ports (the network ports are the most important inputs/outputs (IOs) of the multi-core communication processor). The main operation process of the multi-core communication processor is that the CPUs perform such processing as classification and encapsulation with respect to data packets incoming from the network ports, and after that, send the processed data packets out through the network ports. Each CPU has its own independent local cache. These local caches are exclusively used by the respective CPUs and are dedicated to optimizing the reading and writing efficiency of the CPUs. There is also a system cache on a system bus, and the system cache is shared by each CPU and each network port and is used for storing data between the CPUs and between the CPUs and the network ports to further improve the data reading and writing efficiency of the CPUs. In addition, the multi-core communication processor includes a packet scheduling module, which is mainly configured to schedule a CPU for processing a data packet. Scheduling the CPU for processing the data packet may be implemented in two modes: Pipeline and Run to Complete (RTC). Pipeline means that one data packet is processed in multiple processing steps performed by multiple CPUs respectively in sequence. RTC means one data packet is processed by one CPU from start to end.
Specifically, the process of processing a data packet includes the following steps:
(1) A network port receives a data packet, parses the data packet to obtain a header descriptor, and sends the header descriptor to the packet scheduling module, and the data packet is written into a system cache.
(2) The packet scheduling module determines, according to a particular scheduling algorithm, an idle CPU for processing the data packet, and sends the header descriptor of the data packet to the CPU to notify the CPU of processing the data packet.
(3) The CPU reads the data packet from the system cache and processes the data packet.
(4) After completing the processing, the CPU writes the processed data packet back to the system cache.
(5) The network port sends the data packet out.
If the RTC mode is used, steps 3 and 4 are executed by one CPU once only; if the Pipeline mode is used, steps 3 and 4 are executed by multiple CPUs multiple times.
As can be seen from the process, the most important factor affecting the performance of the multi-core communication processor is a hit rate of the CPU in the system cache in step 3, that is, the probability that the CPU finds the data packet in the system cache in step 3. Because the capacity of the system cache is limited, when data packets continuously enter the system cache from the network port, new data packets continuously replace old data packets in the system cache, and the old data packets are transferred to a Double Data Rate (DDR) double rate synchronous dynamic random access memory. As can be seen from the process, if the CPU can directly obtain the data packet from the system cache in step 3, the efficiency of accessing the data packet by the CPU is improved; and conversely, if the CPU cannot directly obtain the data packet from the system cache in step 3, it is necessary to find the data packet in the DDR, so that the efficiency of accessing the data packet by the CPU is greatly decreased.
The prior art provides an algorithm for replacing data packets in the system cache, that is, a Least Recently Used (LRU) method. This method assumes that a CPU is limited in terms of time and space. The data recently accessed by the CPU is very likely to be frequently accessed by the CPU in a next time period, and conversely, the data that has not been accessed for a long time will not be accessed by the CPU in a future time period. Therefore, when a packet in the system cache needs to be replaced, a least recently used packet is transferred from the system cache to the DDR. Many multi-core communication processors in the industry use the LRU algorithm. Some multi-core communication processors use algorithms similar to the LRU. For example, the system cache of the P4080 chip of FreeScale uses a Pseudo LRU (PLRU) algorithm; and the system cache of the ACP chip of LSI uses a True LRU algorithm.