Distributed file systems offer many compelling advantages in establishing high performance computing environments. One example is the ability to easily expand, even at large scale. In one example, a distributed file system can operate under a cluster of nodes topology, whereby clients can connect to any node among the cluster of nodes to perform file system activity. Individual nodes among the cluster of nodes each can contain their own processor(s), storage drives, memory and the like. Operating together in a cluster, the nodes can respond to client requests, store data, mirror data, and accomplish all the tasks of a modern file system. A cluster of nodes, in some cases, can provide easy scalability by providing for new nodes to be added to the cluster of nodes to increase the amount of storage space within the distributed file system and/or to meet other needs of the users of the distributed file system.
Some distributed file systems can also regularly sync with a backup cluster of nodes. The backup cluster of nodes can operate as an independent file system to the primary cluster of nodes. Data can be cloned on the backup cluster of nodes and periodically updated to include changes made on the primary cluster of nodes. For example, using a snapshot based change identification model to changed files of the file system, block based updates can be made to the backup cluster of nodes that solely require sending modified portions of files to the backup cluster of nodes. A backup cluster of nodes can operate to provide a safe backup from a potential disaster recovery where the primary cluster of nodes has suffered total failure. However, for some use cases, it may not be economical or feasible to locate a backup cluster of nodes geographically separate from the primary cluster of nodes such that a potential disaster does not affect the backup cluster of nodes as well.
One common approach to disaster recovery is to use offsite Tape backup. Using a protocol such as NDMP, a file system can dump an image of its data onto a physical tape backup and that tape backup can then be geographically dispersed from the file system. However, tape backup can consume resources and time making it difficult to scale. One means for addressing this is to use cloud storage services as the backup target. It can be appreciated that cloud storage systems can be operated independent of the primary cluster of nodes and thereby less prone to a single a disaster event affecting both the primary cluster of nodes and the cloud storage backup. It can also be appreciated that cloud storage services can be more efficient and more convenient than tape backups.