NUMA (Non-uniform Memory Access, Non-uniform Memory Access) is a distributed structure, and each processor is mounted with a memory, a peripheral, and the like. Each group of processor and memory are connected to the same system, so the NUMA expresses its advantage in expandability, and has been widely applied in the field of medium and high-grade servers due to the features such as high reliability, high applicability, and high serviceability.
Any processor in nodes of the NUMA may access a random memory, so that different delays exist when each processor accesses different memories. With the extension of the system, NUMA nodes are increased gradually, so the delay for the processor to access remote nodes is greatly increased correspondingly, thereby affecting the overall performance of the system. Especially, for data frequently accessed in the system (for example, a kernel code and the kernel read-only data), if the data only exists in one node, processors of other nodes may have large delay when accessing the data. Moreover, if the data is accessed by the processors of multiple nodes in a short time, transmission bandwidth of interconnected hardware becomes another factor that affects the performance.
In related techniques, data frequently accessed in the system may be copied to a memory of each node, in this way, each node has a local copy, and a progress running on each node may access the local copy, thereby avoiding the delay influence caused by having to access the data frequently accessed at another node, and reducing transmission bandwidth consumption of the interconnected hardware.
By taking the kernel code and kernel read-only data as examples, in order to implement kernel multi-copy, enough memory may be applied for each node, and then, the kernel code and the kernel read-only data are copied to the new area. A mapping relationship between the kernel multi-copy in each node and a corresponding linear address is obtained through calculation. When a process is scheduled to a certain node, some entries of a process page directory table are changed based on the mapping relationship of the kernel copy saved in the node, so that the process may access the kernel code copy of this node through content of the process page directory table.
Through the kernel multi-copy technology, the process is enabled to access the kernel code and the kernel read-only data on this node; however, if the process creates multiple threads and the threads are distributed to different nodes to be executed, the threads still run on the basis of the content of the process page directory table. In this way, the kernel copy pointed to by the process page directory table is located in one node, so the multiple threads cannot access the kernel copy saved in the node, and large delay may still be generated and the accessing is limited by the transmission bandwidth of interconnected hardware.