1. Field of the Invention
The present invention relates to information processing apparatuses, information processing methods, and computer programs. More particularly, the present invention relates to an information processing apparatus that has a structure in accordance with non-uniform memory access (NUMA), which is the architecture of a shared-memory multiprocessor system, an information processing method, and a computer program.
2. Description of the Related Art
In recent years, there have been an increasing number of multiprocessor information processing apparatuses that have a plurality of processors (central processing units (CPUs)) and realize efficient data processing by performing parallel processing using the processors. In such a multiprocessor system, a plurality of processors access a shared memory. In such a system, NUMA is available as an architecture in which access cost from each processor to the memory is not uniform.
FIG. 1 shows a structure example of an information processing apparatus with a NUMA architecture. As shown in FIG. 1, a plurality of chipsets 11 and 21 are interconnected as nodes to a crossbar switch. An address conversion table 31 for converting an address at the time of accessing, from one node, a memory connected to another node is connected to the crossbar switch.
A CPU-1 12, a memory-1 13, and a device-1 14 are connected to the chipset 11 via a system bus 1 serving as a local bus. A CPU-2 22, a memory-2 23, and a device-2 24 are connected to the chipset 21 via a system bus 2 serving as a local bus.
The memory-1 13 and the memory-2 23 are shared by the CPU-1 12 and the CPU-2 22. In NUMA with such a shared-memory structure, access cost from each CPU to each memory is not uniform.
For example, when a task running on the CPU-1 12 accesses data stored in the memory-2 23, the memory-2 23 is necessary to be accessed via the system bus 1 of the chipset 11, the crossbar switch, and the system bus 2 of the chipset 21. In such a manner, when a CPU where a task is running and a memory where data is stored are not on the same local bus (system bus), the memory access cost increases.
Numerous measures have already been proposed to improve the performance related to memory access cost in a system with a NUMA architecture. For example, Japanese Patent No. 3832833 (International Business Machines Corp. (IBM)) proposes a structure that realizes low delay in coherency communication at the time data is given via a bus other than a local bus in response to a read request from a CPU.
Also, Japanese Patent No. 3924206 (IBM) eliminates unnecessary coherency communication by setting a write through indicator in correspondence with data and determining whether details of a change can be cached.
Furthermore, Japanese Unexamined Patent Application Publication No. 2006-39822 (Canon Inc.) discloses a structure that speculatively repeats task allocation to a multiprocessor and determines an optimal combination of a task and a processor on the basis of the value of a communication cost involved in each CPU.
However, these techniques of the related art are only trying to optimize memory access from a CPU to a memory. As devices involving input/output of a large amount of data have been appearing in recent years, a high load may be placed on a CPU because of a device driver. It is thus necessary to achieve optimization, taking into consideration devices.
For example, with the foregoing techniques of the related art, there is no advantageous effect in an information processing apparatus in which a CPU and a memory are on the same local bus, but a device and a memory to be accessed by the device are not on the same local bus. Specifically, there is no advantageous effect in the structure shown in FIG. 2.
Referring to FIG. 2, as in FIG. 1, the chipsets 11 and 21 are interconnected as nodes to the crossbar switch. The address conversion table 31 for converting an address at the time of accessing, from one node, a memory connected to another node is connected to the crossbar switch.
The CPU-1 12, the memory-1 13, and the device-1 14 are connected to the chipset 11 via the system bus 1 serving as a local bus. The CPU-2 22, the memory-2 23, and the device-2 24 are connected to the chipset 21 via the system bus 2 serving as a local bus.
A device driver 41 for the device-1 14 connected to the system bus 1 on the chipset 11 side is set up on the CPU-2 22 connected to the system bus 2 on the chipset 21 side.
By activating the device driver 41 included in the CPU-2 22 on the chipset 21 side, the device-1 14 on the chipset 11 side starts operating, and the device-1 14 performs data processing. For example, when the device-1 14 is a network card, the device-1 14 performs communication processing with the outside via a network. Alternatively, when the device-1 14 is a video card, the device-1 14 performs image data processing.
Data 43 to be processed by the device-1 14, such as communication data or video data, is stored in the memory-2 23 on the chipset 21 side by performing direct memory access (DMA) via the crossbar switch. Also, when obtaining data 42 from the memory-2 23, the device-1 14 performs DMA via the crossbar switch.
These techniques of the related art are only trying to optimize memory access from a CPU to a memory. Therefore, as shown in FIG. 2, as in an information processing apparatus in which a device and a memory are not on the same local bus, in a structure where data is transferred among a device, a memory, and a CPU, an advantageous effect is not sufficiently achieved by using the techniques of the related art.
Furthermore, Japanese Patent No. 3123425 (Nippon Electric Co., Ltd. (NEC)) discloses a structure in which load dispersion is performed by allocating an interrupt from a device to a CPU with the lowest load at that point of time (while using a neural network). This technique is designed by paying attention to load on each CPU.
Even with this structure, as shown in FIG. 3, for example, when a memory to be accessed by a device is not on a local bus to which a CPU is connected, memory access from a driver for the device is performed via a crossbar switch, and accordingly, the load on the CPU on which the driver is running is increased.
An information processing apparatus shown in FIG. 3 has a structure similar to structures shown in FIGS. 1 and 2. The chipsets 11 and 21 are interconnected as nodes to the crossbar switch. The address conversion table 31 for converting an address at the time of accessing, from one node, a memory connected to another node is connected to the crossbar switch.
The CPU-1 12, the memory-1 13, and the device-1 14 are connected to the chipset 11 via the system bus 1 serving as a local bus. The CPU-2 22, the memory-2 23, and the device-2 24 are connected to the chipset 21 via the system bus 2 serving as a local bus.
The device driver 41 for the device-1 14 connected to the system bus 1 on the chipset 11 side is set up on the CPU-2 22 connected to the system bus 2 on the chipset 21 side.
By activating the device driver 41 included in the CPU-2 22 on the chipset 21 side, the device-1 14 on the chipset 11 side starts operating, and the device-1 14 performs data processing. For example, when the device-1 14 is a network card, the device-1 14 performs communication processing with the outside via a network. Alternatively, when the device-1 14 is a video card, the device-1 14 performs image data processing.
In this structure example, unlike the structure shown in FIG. 2, the data 43 processed by the device-1 14 is stored as data 44 by performing DMA to the memory-1 13 connected to the system bus 1 which is a local bus connected to the same chipset. Also, when obtaining data from the memory-1 13, the device-1 14 performs DMA.
In this structure, the device-1 14 can access the memory-1 13 without via the crossbar switch, and accordingly, the memory access cost can be reduced. However, even in this structure, the driver 41 on the CPU-2 22 on the chipset 21 side is necessary to access the memory-1 13 via the crossbar switch, and accordingly, the load on the CPU on which the driver is running is increased.