The network device can implement storing and forwarding of data packets, and create and store context information for data forwarding. The network device according to the prior art can adopt a multi-processor architecture system to complete the work of storing and forwarding data, etc. The multi-processor architecture mainly includes UMA (Uniform Memory Access) and NUMA (Non-Uniform Memory Access). In order to improve the efficiency of accessing data and processing data, the data access architecture shown in FIG. 1 can be employed, to port the current parallel processing software under UMA structure to the hardware platform with NUMA structure.
As shown in FIG. 1, after simply porting the software of the UMA structure to the hardware platform with the NUMA structure, the data on the hardware platform with the NUMA structure will be allocated to the local memory of multiple processors. In FIG. 1, the local memory of CPU A stores DP (Data Plane) Configuration, DP Forwarding Table, Session Table, Buffer; the local memory of CPU B stores DP Statistic, Other data (refers to the data to be stored during packet processing), Buffer. In addition, in FIG. 1, CPU HVVT (HardWare Thread) is the hardware core or hardware thread of the CPU. Data is accessed between different processors via an interconnect bus between the CPUs, such as the QPI bus. The above data stored in processor memory has only one copy globally and can only exist in the local memory of one processor, either CPU A or CPU B. Therefore, for certain data, it is stored either in the local memory of CPU A or in the local memory in CPU B. Since one processor need to access all the above data during packet processing, there will always be some remote memory access with high latency. The high-frequency remote memory access greatly reduces the efficiency of the data processing of network device. In addition, for some network devices, such as FW/NGFW, IDS/IPS, WAF, ADC, BDS, Router, etc., the overall performance of the network device may not improve if the software under UMA structure is simply ported to the hardware platform with NUMA structure, although the number of available processors or the number of processor cores is increased.
To solve the above problem, a data access architecture as shown in FIG. 2 is proposed. Firstly, find the frequently accessed data during processing, such as Forward Table, Session Table, DP Configuration, DP Statistic, etc., then copy these frequently accessed data to the local memory of each processor in the NUMA structure. So, there is a copy of data in the physical memory of each processor under the NUMA structure. Secondly, modify the software program, adding the processor identifier to the index table of the local data, so that when the core or HWT of a certain processor process the data packet, it can access a local copy of the above data, thereby reducing the need for remote data access and reducing performance degradation impact with NUMA structure.
However, this solution requires a large amount of modifications on original software, and it may lead to uncertainties of execution, and increases the workload of software maintenance. In addition, for some global data that need to be written frequently, such as DP Statistic and Session data, multiple copies of the same data do not meet the needs of real-time query.
There is no existing solution for the performance degradation on data access, caused by migrating the distributed structure software program of the network device based on the UMA structure to the hardware platform of the network device based on the NUMA structure.