Distributed computing systems, such as distributed file systems, typically include several system severs or components that are interconnected through a network. The various servers and components of the system run software that controls and manages the various operations of the computing system. Periodically, new upgrades, releases, additions or patches may be created for the software running on the system. For the system to operate properly, this new software must be loaded onto all of the servers and components of the system.
In order to upgrade the currently running software in any prior distributed computing system, the system is temporarily shut down and/or made unavailable. Once the system is stopped, the software upgrades, releases, additions and/or patches are loaded onto all system components, and the system is rebooted. This process undesirably causes the distributed computing system to be inoperable or unavailable for significant periods of time during the upgrade process (i.e., until the upgrade is complete throughout the system), thereby depriving users of the system from accessing and operating the system until the upgrade is complete.
It is therefore desirable to provide a system and method for managing software upgrades in a distributed computing system, which performs software upgrades in the distributed computing system, while allowing the system to remain operable and accessible throughout the upgrade process. Accordingly, the present invention provides a system and method for managing software in a distributed computing system having a plurality of nodes, which performs software upgrades in a sequential or “rolling” manner (e.g., node by node), thereby allowing the distributed computing system to remain operable and available throughout the upgrade process.