The present invention relates to executing software on a parallel, distributed data processing system, and more particularly relates to performing a software operation on one or more nodes of a parallel, distributed data processing system.
U.S. Pat. No. 5,359,730 issued Oct. 25, 1994 to Marron for METHOD OF OPERATING A DATA PROCESSING SYSTEM HAVING A DYNAMIC SOFTWARE UPDATE FACILITY and discloses non-disruptive installation of updated portions of a computer operating system while that operating system continues to run while simultaneously supporting application load on the system.
U.S. Pat. No. 5,421,009 issued May 30, 1995 to Platt for METHOD OF REMOTELY INSTALLING SOFTWARE DIRECTLY FROM A CENTRAL COMPUTER and deals with remote installation of software on a computer system. Disclosed is a method of installing a client portion of client-server software on client nodes without first manually preparing those client nodes with any type of software such as download software.
U.S. Pat. No. 5,471,617 issued Nov. 28, 1995 to Farrand et al. for COMPUTER MANAGEMENT SYSTEM AND ASSOCIATED MANAGEMENT INFORMATION BASE and discloses a method of managing a plurality of networked manageable devices, with a management information base for use in managing hardware objects.
U.S. Pat. No. 5,555,416 issued Sep. 10, 1996 to Owens et al. for AUTOMATED SOFTWARE INSTALLATION AND OPERATING ENVIRONMENT CONFIGURATION FOR A COMPUTER SYSTEM BASED ON CLASSIFICATION RULES and is directed to remote, automated, rules based installation to automatically install software products on a computer system, and configure the operating environment of the computer system.
AIX NETWORK INSTALLATION MANAGEMENT GUIDE AND REFERENCE, SC23-1926-00, available from International Business Machines Corporation, provides information about managing the installation and configuration of software by using a network interface. Network Installation Management (NIM) enables the centrally managed installation of the AIX base operating system, the IBM version of the UNIX operating system, and optional software on machines within a networked environment.
The installation of operating system software on parallel, distributed computing system hardware is typically a complex and time consuming procedure. For a modern, full-functioning operation system such as the AIX operating system, numerous files must be placed on the system. As well, numerous files must be newly created or updated, numerous procedures must execute to successful completion on the involved systems, and other complex functions must be completed. NIM provides the base function to install a single system remotely, that is without requiring direct interaction with the target system. The IBM Parallel System Support Program (PSSP version 2.1) utilizes NIM to provide parallel, remote installation of multiple systems. PSSP installation provides automated installation of multiple systems from a single point of control. Much of the PSSP function is embodied in a single program which invokes numerous NIM, Kerberos and other PSSP functions to configure the installation server system to prepare it to install its client system(s).
However, due to the complexity of the installation process, the networking requirements of both the master and client (target) systems, and the complexity of configuring the installation server system, the installation of a remote system can fail for any of a large variety of reasons. In particular, because the installation server configuration function is contained within a single program which does not record the various states through which the server has progressed, if the installation fails the server and client systems can be left in such a state as to require significant detailed analysis and manual intervention to restore the systems to their previous states. It is not always possible to correct the initial problem and rerun the program because the various states through which the server is progressed are not recorded. Thus, it takes careful analysis and effort to restore the server to its original state. Even in cases where the server configuration program can be rerun, it consumes unnecessary time and resources to rerun all configuration steps when only the remaining steps need be completed.