In many modern computing environments, data is copied from one computing system to another computing system to improve system resilience in case of loss of data at one of the computing systems. For example, a point-in-time snapshot of data might be copied from one computing system to another computing system so that the data can be used in the other system. Making such point-in-time snapshots and copying the snapshots to another computing system on some schedule (e.g., a backup schedule, an incremental backup schedule, etc.) can be used to facilitate capabilities that are known as “replication”, “migration”, “cloning”, etc.
This process of making such point-in-time snapshots and holding them in another computing system becomes more complicated when dealing with inter-system movement of data between computing systems known as clusters. A cluster is a computing system comprising multiple nodes that share a common storage area and a common namespace. Namespaces between computing clusters are different based on the specific configuration of each particular cluster. More specifically, since a computing cluster is a collection of computing nodes where individual ones of the computing nodes access a common storage pool having a single common address space formed of a contiguous set of addresses, it can happen that variations between the configurations of the storage pools of the clusters might influence the syntax and semantics of the name spaces of a particular cluster. Thus, while a particular point-in-time amalgamation of data in the form of backup data in a first cluster might be visible and/or accessible by name in the first cluster, it can happen that the same particular point-in-time amalgamation of data in the first cluster might not be visible or accessible at or by any other cluster. An additional complication arises due to the fact that any metadata that might be used in a first cluster to describe the backup data of the first cluster would not be usable within the namespace and/or context of a different cluster.
This situation is further exacerbated when the two clusters perform their respective own garbage collection operations independently such that neither cluster guarantees to the other cluster that any particular snapshot will be available at any future moment in time.
Unfortunately, this scenario then raises the issue that, since the name spaces and contexts are different between clusters, one cluster might not be able to fully utilize backup data from another cluster. That is, backup data that is available at one cluster might not be identifiable by any other cluster, thus making inter-cluster replication either overly complicated or inefficient. Therefore, what is needed is a better way to manage replication of data between clusters.