Distributed file systems offer many compelling advantages in establishing high performance computing environments. One example is the ability to easily expand, even at large scale. Another example is the ability to support multiple unique network protocols. In one example, a distributed file system can operate under a cluster of nodes topology, whereby clients can connect to any node among the cluster of nodes to perform file system activity. Individual nodes among the cluster of nodes each can contain their own processor(s), storage drives, memory and the like. Operating together in a cluster, the nodes can respond to client requests, store data, mirror data, and accomplish all the tasks of a modern file system. A cluster of nodes, in some cases, can provide easy scalability by providing for new nodes to be added to the cluster of nodes to increase the amount of storage space within the distributed file system and/or to meet other needs of the users of the distributed file system.
One demand that users of a distributed file system likely have is to avoid any single point of failure to user critical work flows. For example, if a storage device within one of the nodes fails, users expect the data to be useable from a secondary source, with as little disruption as possible. This is one reason why data is mirrored across more than one storage device and more than one node. If a drive or a node fails, a client can still find the data they seek within a different drive and/or connect to a different node. With businesses depending on the reliability of their data storage systems in order to serve their customers, many businesses expect a distributed file system to continue to operate every hour of every day throughout the year. However, when an administrator of a distributed file system operating within a cluster of nodes wishes to upgrade the file system to a new version, the process can cause disruptions to users of the file system. For example, if every node of the file system needed to be upgraded simultaneously, clients would be unable to connect to a node and access data stored within the file system during the upgrade process. However, if nodes are upgraded one-by-one, nodes running two different versions of the operating software may be incompatible. In addition, once a node or the cluster of nodes has been upgraded, an administrator may wish to downgrade back to previous version for any number of reasons. Therefore, there exists a need to provide for non-disruptive upgrade and rollback capabilities for a cluster of nodes operating as a distributed file system, that maintain continuous availability to clients of the file system, while minimizing disruptions to their workflows.