1. Technical Field
This application relates to the field of storing data, and more particularly to the field of providing services in connection with data storage.
2. Description of Related Art
Cloud computing is Internet-based computing, whereby shared resources, software and information are provided to computers and other devices on-demand, like a public utility. A cloud may be a hybrid system that might consist of physical and virtual compute and storage resources that offer various levels of compute services—from infrastructure to platform- and application-framework-level. In addition, computations in the cloud may be performed by a collection of distributed applications that cooperate in performing computational tasks. The distributed applications may be provided by a collection of storage objects, some of which are executable. Due to the requirements of the application runtime environments, some of the storage objects must be co-located (e.g. the virtual disks of a virtual machine must all be placed in the same virtual infrastructure), and some of the objects do not have to be co-located (e.g. the virtual disks and the application datasets that reside in cloud storage). Also, when a distributed application executes, code from some executable objects happens to communicate a lot with code from other executable objects (tightly coupled computation), while communicating less with code from other executable objects (loosely coupled computation). Examples of executable objects include virtual disks that contain OS and application code, software components (e.g. OSGi bundles, java or python packages), and software distribution packages (e.g. RPM packages). Examples of data objects include data virtual disks that contain database and filesystem volumes, virtual machine descriptor files, application descriptor files, and archived and/or compressed collections of data objects for use by applications.
One model of the Cloud is a two tier system with one tier comprising geographically dispersed compute resources with relatively small amounts of internal storage. The storage is used for hosting virtual/physical machines as they execute on the virtualized/physical compute resources. The second tier comprises geographically dispersed storage resources with large amounts of storage generally optimized for storing infrequently modified data (e.g. data at rest and snapshots of virtual machines running on the compute resources). The data in the Cloud includes virtual machine disk images, both those containing executable code (guest OS, applications) and those containing datasets (e.g. database LUNs and filesystem images). Cloud storage provides a variety of storage services for data protection, high availability, disaster recovery, etc. This usually means that there are replicas of the data in the Cloud present in several geographies.
The Cloud is a heavily distributed system that includes collections compute and storage resources in geographically dispersed locations. In addition, computations in the Cloud may be performed by collections of virtual machines that process shared datasets. The virtual machines' storage footprint and the dataset sizes are often sufficiently large to discourage movement of these objects across network localities. In any case, it may be difficult to determine when movement might be advantageous.
Accordingly, it is desirable to provide a system that provides for optimal co-location of compute and data resources within the Cloud.