The present invention relates to installation of a program into nodes independent of one another in a parallel computer system including plural nodes.
An installing method for use in a parallel computer system is disclosed, for example, in JP-A-11-296349, wherein upon completion of installation into a particular node, this node serves as a server machine to sequentially install software into different nodes to reduce time required for the installation in the overall system.
Also, JP-A-6-309261 discloses a method including a step of sending an install instruction from a server machine, a step of requesting required software from client machines to the server machine, and a step of starting installation of the software into plural client machines.
Further, JP-A-6-59994 discloses a method including a step of sending an install start instruction and install information from a primary station computer device to plural secondary station computer devices to install a program into plural client machines.
A parallel computer system may include a number of nodes ranging from several tens to several thousands or more because of requirements imposed thereto to execute a large scale of computations. When the same programs are incorporated into these nodes, it is necessary to reduce time required for installing the programs. In the prior art JP-A-11-296349, assuming that the number of nodes in a system is N, and time required for installation per node is T, time required for the installation into all nodes is expressed by (log2N)xc3x97T.
It is an object of the present invention to further reduce the above installation time (log2N)xc3x97T required for installing into plural nodes.
The present invention is characterized by simultaneously performing install processing in plural nodes by simultaneously transferring data on a program to be installed, utilizing communication means interconnecting the respective nodes.
An arbitrary node in a parallel computer system reads every predefined amount of programs from a storage medium which stores the programs, and delivers program data to all nodes, into which the programs are to be installed, through the communication means. Each node receives the data and writes the data into a storage device of the node itself to install the same program in parallel.
Also, a master install control program for distributing a program is executed by one node or an install device in the parallel computer system. The master install control program reads a program from a storage medium which stores programs, and transfers the read program. In this event, plural buffers are used for communication of data associated with the reading and transferring of the program.
A node receiving the program executes an install control program for receiving the distributed data. The install control program receives data on the program, which is to be executed in the node, from the distribution node, and writes the received data into a storage device of the node itself. Plural buffers are utilized for communication of data during the reception of data and the writing into the storage device.
The master install control program and the install control program rely on the buffers to process in parallel the reading of the program from the recording medium, the delivery of the read program, the reception of the program, and the writing of the program into the storage device, to reduce time required for installing the program into plural nodes.
In an environment in which the present invention is implemented under the best condition, transfer time is calculated as follows. Assuming for example that the number of nodes is N; a total data size of a program to be distributed is A; a predefined amount of data size for distribution is B; time required for reading the predefined amount of data is C; time required for transferring the predefined amount of data to all nodes is D; time required for receiving the predefined amount of data is E; and time required for writing the predefined amount of data into an external storage device is F, time required for installing the program into all nodes is expressed by ((A/B)xc3x97F)+(C+D+E). (C+D+E) is time taken for transferring the first predefined amount of data in the processing for writing the predefined amount of data into the external storage device. Subsequently, the data read processing, the transfer-to-node processing and the data reception processing are performed in parallel through the buffers, so that time required for the processing is included in time required for writing data into the storage device.
As described above, since a program is distributed to all nodes at one time, time required for installing the program into all the nodes does not depend on the number of nodes N.