In a computer cloud environment, multiple virtual machines (VM) are usually run on the same host computer. Virtualization allows multiplexing of the underlying host computer between different virtual machines. The host computer allocates a certain amount of its resources to each of the virtual machines. Each virtual machine is then able to use the allocated resources to execute applications, including operating systems (OS, here referred to as guest operating system). The software layer providing the virtualization is commonly referred to as a hypervisor and is also known as a virtual machine monitor (VMM), a kernel-based hypervisor or a host operating system. The hypervisor emulates the underlying hardware of the host computer, making the use of the virtual machine transparent to the guest operating system and the user of the computer. Virtual machine disks are often encapsulated into files, making it possible to rapidly save, copy, and provide a virtual machine. Full systems (fully configured applications, operating systems, BIOS and virtual hardware) can be moved, within seconds, from one physical server to another for zero downtime maintenance and continuous workload consolidation.
A computer environment including memory for the temporary storage of data and disk or other storage for the persistent storage of data is virtualized by providing an abstraction or virtualization layer on the computer environment. One or more server applications are operated on the virtualization layer, each configured to read data from storage into memory and to write data from memory to storage during operation. The virtualization layer provides a representation of resources (such as memory, storage, and the like) within the computer environment to the server applications. One or more server applications are encapsulated within a virtual machine and provided with an OS to manage corresponding virtualized hardware and software resources presented to each server application.
During VM lifetime, the amount of data added to the VM's disks grows steadily, because often similar and/or same operating systems and/or user data are stored several times on these disks. Identical files may reside in disk caches of a local server multiple times. Additionally, the I/O utilization may become a bottleneck of a computer system, because the more often a cache flushes the often the server has to access the I/O subsystem. When using storage area network (SAN) or network attached storage (NAS) technologies, this also results in increased network utilization.
U.S. 2009/0063528 A1 describes a data de-duplication application that is operated in a computer environment to reduce redundant data in memory and/or storage. The de-duplication application identifies redundant data and replaces it with a reference and/or pointers to a copy of the data that is already present in the memory or storage.
U.S. Pat. No. 8,191,065 B2 describes a method and a system for managing images of virtual machines hosted by a server. The system includes a common data storage to store a base virtual machine image shared by the virtual machines, and one or more individual data storages to store incremental images specific to respective virtual machines. The server detects image modifications that are common to the virtual machines, and copies these common modifications to the base virtual machine image in the common data storage. In addition, the server adds pointers to the copied modifications in the common data storage to incremental VM images in the individual data storages.