Distributed computing systems are increasingly being utilized to support high-performance computing applications. Typically, distributed computing systems are constructed from a collection of computing nodes that combine to provide a set of processing services to implement the high performance computing applications. Each of the computing nodes in the distributed computing system is typically a separate, independent computing device interconnected with each of the other computing nodes via a communications medium, e.g., a network.
One of the challenges with distributed computing systems is the management of the software images associated with the computing nodes. The term “software image” generally refers to the complete set of software associated with an individual computing node, including the operating system and all boot code, middleware and application files.
One challenge arises when a new computing node is allocated to the distributed computing system. In this situation, the computing node must be loaded with a software image that is consistent with the current software images running on the related computing nodes. Similarly, it may be necessary to load a new software image on a computing node when the functions of computing node change within the distributed computing system.
Moreover, other challenges arise when installing new software, updating software versions or applying software patches to the software images associated with the computer nodes. For example, it is often necessary to shutdown and reboot a computing node when installing or updating software on the computing node. In some situations, it may even be necessary to take the entire distributed computing system offline when performing substantial software installations or updates. As a result, the computing performance of the distributed computing system may be severely impacted during the installation or update process.