1. Field of the Invention
The present invention relates to a parallel processing system and a parallel processing method.
2. Description of the Related Art
As a high speed operation computer system like a supercomputer, a high speed parallel computer system is known in which one job is divided into a plurality of processes (tasks), and a plurality of processors cooperate and execute the tasks in parallel. Thus, a limit of performance improvement of a single processor can be overcome. A conventional parallel computer system called a perfectly distributed memory parallel computer system is composed of nodes connected by a dedicated network. Each of the nodes is composed of processors or CPUs, a memory, and a communication control unit (CCU). Communication between nodes is carried out via the network. Also, in recent years, a distributed sharing memory type parallel computer system (cluster type parallel computer system) is known in which the overhead for the parallel processing is less such as the easiness of the programming and internode data transfer. In such a parallel computer system, the node is composed of SMP (Symmetric Multiple Processor) and a communication control unit, and a memory is shared by a plurality of processors in the range with appropriate implementing cost.
In these distributed memory type parallel computer systems, the internode communication processing time is large, compared with a calculation processing time in the node. Therefore, the internode communication processing time transfer and the calculation processing are overlapped in the CPU, and an asynchronous communication instruction is completed when a CPU issues the asynchronous communication instruction. Thus, the communication processing time is concealed and the subsequent process and interrupt process can be executed without waiting for the completion of the asynchronous communication instruction.
Conventionally, a coprocessor is provided separately from a CPU, and communication processing is requested from a user program to an operating system (OS) by a system call from the viewpoint of hardware resource control, as in input/output processing. Thus, the communication processing is executed asynchronously from the CPU. However, this system call requires large software overhead so that the performance improvement of the parallel processing system is hindered.
For this reason, the technique is often adopted in which an asynchronous transfer instruction can be issued directly from the user program. When only the OS controls the asynchronous communication instruction, it is possible to ask the OS to control the asynchronous communication instructions such that the hardware resources are not fully consumed. However, when the asynchronous transfer instructions is issued directly from the user program, a flow control of the asynchronous transfer instructions is necessary to protect the system from the system performance degradation due to the hardware resource control overhead.
Conventionally, as such a flow control, a hand-shaking system is known in which when the CCU receives an asynchronous communication request from the CPU, the CCU notifies the reception of the request to the CPU. When a hand-shaking reply is notified, the CPU interprets the hand-shaking reply as the completion of the asynchronous communication instruction and starts the issuance of a subsequent instruction and an interruption process.
In the hand-shaking system, as shown in FIGS. 8 and 9, when a request buffer of the CCU 101 is full, the CCU 101 does not return the hand-shaking reply to the CPU 102 until the entry is ensured in the request buffer (Steps S102, 103 and 104). Since the asynchronous communication instruction does not complete, the CPU cannot release an interrupt prohibition state. Therefore, there is a problem that the use efficiency of the CPU is reduced remarkably. Especially, in the parallel computer system which several hundreds of nodes are connected via an interconnection network, it is expected that it takes a very long time until an entry is ensured in the request buffer depending on the communication state on the interconnection network.
In conjunction with the above description, a communication control apparatus is disclosed in Japanese Laid Open Patent application (JP-A-Heisei 2-64838). In this reference, a plurality of communication control apparatuses are connected to a host apparatus. The communication control apparatus is composed of a reception buffer receiving and temporarily storing data in accordance with an instruction from the host apparatus. A first section sets a reception buffer full notice flag when reception data are stored in the reception buffer more than a first predetermined quantity, and resets the reception buffer full notice flag when the reception data are stored in the reception buffer less than a first predetermined quantity which is less than the first predetermined quantity. A flag setting section checks the reception buffer full notice flag when transmitting a transmission frame in response to a transmission instruction from the host apparatus, sets a predetermined bit to xe2x80x9c1xe2x80x9d of a control section of the transmission frame when the flag is set, and to xe2x80x9c0xe2x80x9d when the flag is reset, and transmits the transmission frame. A second section receives the transmission frame, sets the reception buffer full notice flag when the predetermined bit of the control section is set to xe2x80x9c1xe2x80x9d, and resets the reception buffer full notice flag when the predetermined bit of the control section is set to xe2x80x9c0xe2x80x9d. A report section checks the reception buffer full notice flag when a transmission instruction is received from the host apparatus, do not receive transmission data from the host apparatus when the flag is set, and reports that the data are stored in the reception buffer more than the first predetermined quantity.
Also, a communication control system is disclosed in Japanese Laid Open Patent Application (JP-A-Heisei 4-18500). In this reference, the communication control system is composed of a plurality of communication units communicating with each other. The control unit is composed of a communication control section connected to a transmission path and a main control section communicating with the other communication units via the communication control unit. A first reception buffer is provided to store reception data to the main control section, and a second reception buffer is provided to store reception data to the communication control section. A common memory is provided for the main control section and the communication control section. The main control section sets a reception stop flag to the common memory when the first reception buffer is full. The communication control section stores the reception data in the second reception buffer depending on the reception stop flag. The communication control section transmits a transmission stop signal when the second reception buffer is full and sets a transmission stop signal transmission flag. The transmission control section sets a first transmission stop flag to the common memory when the transmission buffer is full. The main control section stops sending of the transmission data to the communication control section depending on the first transmission stop flag. The main control section resets the reception stop flag when the first reception buffer is not full, and sends an interrupt signal to the communication control section. The communication control section transfers the data from the second reception buffer to the first reception buffer in response to the interrupt signal. The communication control section transmits a transmission permission signal when the reception stop flag is reset and the transmission stop signal transmission flag is set. When a transmission stop signal is received from another unit, the communication control section is sets the second transmission stop flag and stops the transmission of the transmission data to the other unit. When the transmission permission signal is received from another unit, the communication control section resets the second transmission stop flag and restarts the transmission.
Also, a broadcast communication system is disclosed in Japanese Laid Open Patent application (JP-A-Heisei 5-327705). In this reference, data is transmitted from a host computer to a channel via at least two data buffers. A queue buffer (an area in which the host computer registers a communication processing request) is provided in the channel control module to control the nearest data buffer to a channel. For broadcasting of same data (broadcasting data) from the host computer to a plurality of terminals, in a CPU of the host computer are provided a section for detecting an empty state of the queue buffer, a section for stopping registration of a broadcasting data transmission instruction in the queue buffer when the queue buffer is full, a section for notifying a terminal to an application program, the registration of the broadcasting data transmission instruction for the terminal being stopped.
Also, a communication buffer control apparatus is disclosed in Japanese Laid Open Patent application (JP-A-Heisei 7-93170). In this reference, the communication buffer control apparatus is composed of a buffer remaining quantity table which stores an available remaining buffer quantity, and a task use buffer quantity table which stores a task use buffer quantity which is allocated to every task based on the remaining buffer quantity. When the remaining buffer quantity becomes smaller than a predetermined minimum value or a buffer quantity used by a task becomes larger than another upper limit value, the use of the above remaining buffer is stopped and it is notified to a requesting task. When the remaining buffer quantity becomes larger than a predetermined minimum value and the buffer quantity used by each task becomes smaller than the upper limit value, the use of the remaining buffer is restarted. Thus, fair communication service is given to a plurality of communications links which require processing at the same time. The reception is restarted after a temporal reception stop due to fullness of the buffer. Moreover, the detection of the buffer control fault due to a fault of a communication control system gets easy.
An object of the present invention is to provide a parallel processing system and a parallel processing method in which the use efficiency of a CPU can be improved.
In an aspect of the present invention, a parallel processing system includes a network and a plurality of nodes which communicates asynchronously between the plurality of nodes through the network. Each of the plurality of nodes may include a plurality of CPUs and a communication control unit. Each of the plurality of CPUs as an issuing CPU generates and transmits an asynchronous communication request, retransmits the asynchronous communication request in response to a non-acceptance reply, and executes a subsequent process in response to an acceptance reply. The communication control unit determines whether the asynchronous communication request is acceptable, returns the acceptance reply to the issuing CPU when the asynchronous communication request is acceptable, and the non-acceptance reply to the issuing CPU when the asynchronous communication request is not acceptable, and executes the asynchronous communication request.
Here, the communication control unit has a request buffer, and the communication control unit determines that the asynchronous communication request is acceptable, when the request buffer is not full, and determines that the asynchronous communication request is not acceptable, when the request buffer is full.
In this case, the communication control unit may store the asynchronous communication request in the request buffer when the asynchronous communication request is acceptable. Also, the communication control unit may discard the asynchronous communication request when the asynchronous communication request is not acceptable.
Also, the issuing CPU enters a waiting mode after transmitting the asynchronous communication request to the communication control unit, and the issuing CPU is in an interrupt prohibition state in the waiting mode.
Also, the issuing CPU may include an instruction issuing control section which generates and transmits the asynchronous transfer request to the communication control unit, and a reply receiving register which receives the acceptance reply or the non-acceptance reply from the communication control unit. In this case, the instruction issuing control section of the issuing CPU sets the issuing CPU to a waiting mode in which reception of an interrupt is prohibited, after transmitting the asynchronous communication request to the communication control unit. The reply receiving register releases the waiting mode when the acceptance reply is received from the communication control unit.
Also, the communication control unit may include a request control section and a communication executing section. The request control section determines whether the asynchronous communication request is acceptable, and returns the acceptance reply to the issuing CPU when the asynchronous communication request is acceptable, and the non-acceptance reply to the issuing CPU when the asynchronous communication request is not acceptable. The communication executing section receives the asynchronous communication request and executes the asynchronous communication request.
In this case, the request control section may include a request receiving section which receives the asynchronous transfer request from the issuing CPU; a determining section which determines whether the asynchronous transfer request is acceptable; and a request control section which has the request buffer, and stores the asynchronous transfer request in the request buffer when the asynchronous transfer request is determined to be acceptable by the determining section.
In another aspect of the present invention, a parallel processing method may include a plurality of node, each of which may include a plurality of CPUs and a communication control unit, the parallel processing method may be attained by (a) issuing an asynchronous communication request from one of the plurality of CPUs as an issuing CPU; by (b) setting the issuing CPU to a waiting state such that process change is prohibited, after the issuance; by (c) determining, in the communication control unit, whether the asynchronous communication request is acceptable; by (d) returning an acceptance reply from the communication control unit to the issuing CPU when the asynchronous communication request is acceptable; and by (e) releasing the issuing CPU from the waiting state in response to the acceptance reply.
Here, the parallel processing method may further include (f) executing a subsequent process in response to the acceptance reply.
Also, the parallel processing method may further includes (g) returning a non-acceptance reply from the communication control unit to the issuing CPU when the asynchronous communication request is not acceptable; and (h) reissuing the asynchronous communication request from the issuing CPU to the communication control unit in response to the non-acceptance reply.
Also, the parallel processing method may further include (i) executing the asynchronous communication request.
Also, the (c) determining step may be attained by determining that the asynchronous communication request is acceptable, when a request buffer in the communication control unit is not full; and by storing the asynchronous communication request in the request buffer when the request buffer is not full.
Also, the (c) determining step may be attained by determining that the asynchronous communication request is not acceptable, when a request buffer in the communication control unit is full; and by discarding the asynchronous communication request when a request buffer in the communication control unit is full.