In an Infrastructure as a Service (IaaS) cloud computing service like Amazon EC2 (Elastic Cloud Computing), ISAAC (IBM Service Agility Accelerator for Cloud), RHEV (Red Hat Enterprise Virtualization), etc., elastic virtual machine instances can be provisioned with high scalability. Elastic virtual machine instances are launched from a master virtual machine image in response to users' requests. The master virtual machine image represents a specific configuration of hardware resources such as computing and storage, an operating system platform, and applications etc. All the elastic virtual machine instances launched from a master virtual machine image share common files and data of the master virtual machine image, and a virtual machine instance only stores minimal modified data locally using the Copy-On-Write technique, thus saving the maintenance costs and disk data. In addition, applications can be pre-installed into and configured in the master virtual machine image, so that each user does not have to install and configure applications in his launched virtual machine instance separately, thus saving a lot of time and efforts, lowering the maintenance and storage costs, and shortening the time to value.
However, in such an IaaS solution, all the computing resources are divided into two domains: computing domain and storage domain. The computing domain consists of computing nodes, and provides computing resources (mainly including CPUs, memories and network cards, etc.), however, all the data therein are deemed disposable. The storage domains consist of storage nodes, and provide storage resources for storing data deemed by the user as needing to be retained. An elastic virtual machine instance runs on a computing node, and any data changes stored on the computing node by it using the Copy-On-Write technique is not persistent. If the hardware on which the instance is running fails, or the instance is terminated or shut down, all the newly generated data will be lost. This is because, the elastic virtual machine instance runs on a local temporary image on the computing node based on the master virtual machine image, and the changed data in the temporary image is not stored in the storage domain; while its master virtual machine image does not allow write back, since it is read-only and shared by many users.
Such changed data generated after the virtual machine instance is launched represents the user's actual business data, thus should be stored in the storage volume for the user in the cloud. Currently, cloud service provides some persistent storage solution, namely the storage domain in a cloud environment as described above, so that a user can store any data that he desires to store. For example, Amazon provides the persistent storage using Amazon EBS (Elastic Block Storage), ISAAC provides the Volume solution. A user can create any number of storage volumes that he desires, attach the storage volume to a virtual machine instance as a raw block storage device like an unformatted hard disk, format it using any file system, and mount it to a file directory or logical disk. When the elastic virtual machine instance is terminated, the storage volume remains on the storage node, thus it can be re-attached to a newly launched elastic virtual machine instance. However, such a persistent storage solution is only applicable for the persistent storage of data of an application newly installed and configured after the elastic virtual machine instance has been launched and the storage volume has been created, attached, formatted and mounted, since in the prior art, obviously only after an elastic virtual machine instance has been launched, and a storage has been created, attached, formatted and mounted, can an application be installed and configured so that the data generated during its running can be stored persistently in the volume. While for those applications already pre-installed and pre-configured, the data generated during their running cannot be stored in the volume.
In order to provide persistent storage of user data for a pre-installed and pre-configured software application in a master virtual machine image, presently cloud service providers provides the following three solutions:
1) install and configure all the software applications in one or a set of master virtual machine images; after launching virtual machine instances, for each virtual machine instance, configure the database or file system backup solution as backing up data to a storage volume.
Such a solution can well realize the sharing of pre-installed and pre-configured software applications, thus a virtual machine instance with the configured applications can be launched quickly. However, if a virtual machine instance is terminated, all the data after the last backup operation will be lost. Moreover, such a solution needs to additionally manage a backup solution for each virtual machine instance, even though there is always an overall backup solution for the entire storage provided by cloud service provider to ensure the data security.
2) Launch one virtual machine instance with OS only; create storage volumes and attach them to the instance; then install software applications and configure them to store the persistent data in the storage volumes directly.
This solution actually is used by many users manually, and it is also easy to automate it by IT technology. However it can not provide a virtual machine with installed applications quickly, because it takes a lot of time to install applications, especially some large complex software applications. For example, it will take about 10 hours to finish the installation of IBM Maximo Asset Management application v7.5, and it may take about 80 hours to install a more complex solution. And the application installation can not be shared between different users, and each user needs to install desired applications separately. And for the same user, if the virtual machine instance dies, it is needed to install the application again.
3) Some cloud solution provides a persistent virtual machine instance to resolve this issue. In this case, when a virtual machine instance needs to be launched, a new image(storage volume) is copied from the master virtual machine image on the storage node, and the computing node is connected to the image remotely. The computing node will no longer generate a temporary image using the Copy-On-Write technique. Thus, application data will be stored in the storage volume in the storage domain, therefore when the virtual machine or the host computer is shut down, the application data is still persistent.
This solution needs to copy the whole master virtual machine image, which will take a long time. For example, when copying a master virtual machine image of 100 G, it will take about half an hour. And also because the image in the storage volume is independent of the master image after copying is finished, the system administrator needs extra effort to maintain this image for OS patches, application patches, etc. So if there are 1000 instances, for example, the system administrator' effort will be increased by 1000 times.
Therefore, there is a need in the art for a solution for managing the persistent data of a pre-installed application in a compute cloud elastic virtual machine instance that can overcome the drawbacks described above.