The advent of cloud-based computing architectures has opened new possibilities for the rapid and scalable deployment of virtual Web stores, media outlets, social networking sites, and many other on-line sites or services. In general, a cloud-based architecture deploys a set of hosted resources such as processors, operating systems, software and other components that can be combined together to form virtual machines. A user or customer can request the instantiation of a virtual machine or set of machines from those resources from a central server or cloud management system to perform intended tasks, services, or applications. For example, a user may wish to set up and instantiate a virtual server from the cloud to create a storefront to market products or services on a temporary basis, for instance, to sell tickets to or merchandise for an upcoming sports or musical performance. The user can subscribe to the set of resources needed to build and run the set of instantiated virtual machines on a comparatively short-term basis, such as hours or days, for their intended application.
Typically, when a user utilizes a cloud, the user must track the software applications executed in the cloud and/or processes instantiated in the cloud. For example, the user must track the cloud processes to ensure that the correct cloud processes have been instantiated, that the cloud processes are functioning properly and/or efficiently, that the cloud is providing sufficient resources to the cloud processes, and so forth. Due in part to the user's requirements and overall usage of the cloud, the user may have many applications and/or processes instantiated in a cloud at any given instant, and the user's deployment of virtual machines, software, and other resources can change dynamically over time. In cases, the user may also utilize multiple independent clouds to support the user's cloud deployment. That user may further instantiate and use multiple applications or other software or services inside or across multiple of those cloud boundaries, and those resources may be used or consumed by multiple or differing end-user groups in those different cloud networks.
For various reasons, an administrator or other user may wish to consider transporting or migrating a set of data in cloud-hosted storage of one cloud provider to cloud storage associated with another cloud provider. For example, the other cloud provider may offer more storage, better subscription rates, and/or other benefits. In some cases, the administrator may have a large amount of data already in the cloud-hosted storage. For example, in the case of relatively large-scale arrangements, such as those maintained, merely for instance, by hospitals, government agencies, financial institutions, or other entities, the amount of data that needs to be transported or migrated may be in the range of terabytes, petabytes, or more. In the case of those comparatively large-scale data installations, an attempt to transport or migrate the data to another cloud provider over public Internet connections, such as packet-switched TCP/IP (transfer control protocol/Internet protocol) or FTP (file transfer protocol) connections, the delivery of the data payload could require days or weeks of time.
For many organizations, that type of migration delay may be impractical or impossible. In addition, the relatively narrow-bandwidth connections available over the public Internet may not be secure, and for sensitive data or applications, the use of such connections may also not be a valid or practical option. Moreover, narrow-bandwidth connections into host or target storage clouds may not allow for data management services such as error correction, in-flight encryption, or other security or management options. Further, an administrator or other entity associated with a data payload in a source cloud may migrate a data payload to a target cloud while inadvertently failing to realize that a replicated version of the data payload was already stored on a service associated with the target cloud. As such, the administrator may squander processing and transfer time that could have been saved by instead transferring the replicated version of the data payload to the target cloud.
Therefore, it may be desirable to provide systems and methods for identifying data that is consistent with data in the cloud-hosted storage and is already replicated on cloud data distribution sites. In particular, it may be desirable to provide systems and methods for locating consistent data on sites connected to a target cloud to bypass a full-scale migration of data from a source cloud to a target cloud.