In a networked virtualization environment for storage management, several nodes (e.g., servers, data centers) share a plurality of storage devices over a network. Each node may include local storage devices (e.g., solid state drive (SSD)) and the networked virtualization environment may also include several networked storage devices (e.g., cloud storage, storage area network (SAN), network file servers). Nodes within the virtualization environment for storage management may access networked storage devices and/or local storage devices of other nodes in the virtualization environment through the network. Likewise, nodes may communicate amongst each other over the same network.
Each node may host several user virtual machines, and virtual disks may be exposed by a node to its corresponding user virtual machines. In order to provide optimal storage management functionality to user virtual machines running within the networked virtualization environment, updates may be performed periodically at the nodes of the networked virtualization environment to ensure that the most current version of storage management functionality is available to the user virtual machines. To complete an update for a node in the networked virtualization environment, the node must be shut down or restarted for a period of time, where data residing at the node is unavailable during that portion of the update process. For the networked virtualization environment for storage management to continue operating without error, it must be ensured that data that is unavailable at a node currently undergoing an update process may be accessed at some other location within the networked virtualization environment.
Such updates may also be necessary for other components of the system. For example, the hypervisor that underpins the virtualization system may need to install updates. To complete an upgrade of the hypervisor, the node must be shut down or restarted for a period of time, where that node becomes unavailable during that portion of the update process. As another example, the firmware of storage devices at a node may also undergo an upgrade, where updated firmware is installed to that storage device. During the upgrade process, the storage device will be taken offline and remain unavailable until it has been brought back up again after the upgrade process is complete. Despite the fact that these nodes/devices may be unavailable during the upgrade time period, it is still necessary to undergo the upgrade process since the updates to be installed may relate to necessary bug fixes, installation of required security updates, or the like. However, the goal is to minimize the severity of the impact upon the overall system due to the upgrade process.
Therefore, what is needed is a mechanism for performing a rolling update in a networked virtualization environment for storage management that optimizes resource availability while minimizing data loss and service loss.