The present invention relates generally to storage systems and, more particularly, to server image capacity optimization.
In order to address the rapid growth of digital data size in datacenters, the scale-out type of storage solution has emerged rather than using a large monolithic storage device in order to avoid the inefficiency of the storage. A variety of storage virtualization technologies have enabled plural devices to provide a logically consolidated large size of storage capacity to host servers or users, in accordance with adding physical storage devices or medium such as HDD. Storage devices are interconnected against each others to reference, share, write, or copy data mutually. Examples of the volume location virtualization technology can be found in U.S. Pat. Nos. 6,098,129 and 7,051,121.
Rapid penetration of virtual server deployment in enterprise datacenters is another big trend in the IT world today. Not only the OS (operating system) itself but also Virtual Appliance (which is the coupled program image that is able to deploy on the hypervisor that is composed of OS, middleware, and some objective dedicated application) is becoming a major option of virtual server image provisioning. It is a very easy thing to do since there are virtual appliances marketplaces (e.g., http://www.vmware.com/appliances) and the administrator can just download the program and will be ready to start it just as an appliance.
As more virtual server deployment becomes easier, more virtual server sprawl has been seen recently. In addition, the size of each virtual server image (which means including required middleware and applications) has been rapidly growing due to the enrich functions. As a result of this combination, a datacenter needs to have a storage system of a very large size to hold virtual server images.
To optimize the total size of the virtual server images deployed in a datacenter, writeable snapshot technology has the potential to address the issue. For the use of writeable snapshot technology to reduce the virtual server image capacity, see, e.g., http://www.3par.com/SiteObjects/A7B11BEED3AC16DF15CA5A82011BF265/3PAR-sp-ds-08.1.pdf. Moreover, Fitzgerald disclosed a management technique of virtual server instances in US20080134175 A1. It mentioned having a snapshot-like configuration in the system. Also, it described the virtual server image distribution behavior from the repository to a host. However, Fitzgerald did not teach how to reduce the data copy amount of the image during the provisioning, as disclosed in the present invention.
Because the deployment of a plurality of virtual server images has a very large portion of redundant data due to the fact that they hold the same OS, middleware, and applications, and just a little portion of customized area, a snapshot mechanism works well to eliminate the redundant portion while keeping the presenting of the plurality of virtual server images (virtually). The original image is called the “Gold Image.” When a new virtual server instance is needed, a snapshot volume of the Gold Image will be created and associated with the virtual server. The snapshot volume is just a logical volume but the data read access can be done by actual read of the corresponding portion of data on the Gold Image. When a write access comes, it will then allocate a little capacity chunk to the snapshot volume and hold the specific data written, and this is the “customized part” from the original data of the Gold Image. With this virtualization technique only the customized data will be newly allocated and the original data part will not be allocated to the respective virtual server image volume. Thus, the total size of storage will be optimized.
However, although writeable snapshot eliminates redundant data within a single storage device, the optimization cannot be applied between separated devices. As described above, the scale-out type of storage system has emerged as the solution to provide the huge capacity of storage in each datacenter to address the rapid growth of digital data. Therefore, many copies of the same Gold Image need to actually exist as the root of snapshots in respective devices, which again results in having a lot of redundant data from the perspective of the entire datacenter.