Challenges exist in the de-duplication of virtual machine (VM) images. Input/output (I/O) contention occurs when accessing VM images, and managing such contention can be important for application performance, user experience, infrastructure cost, etc. However, challenges can arise due to, for example, a high density virtualized environment, a large number of VM images, and/or limited resources (memory and disk space) for caching images locally on compute nodes.
Different VM images often have common portions of data. Reasons for similarity can include, for example, similar operating systems, similar applications, and/or the fact that many new images are created by slightly modifying existing images. Accordingly, VM image access de-duplication aims to avoid I/O operations on blocks with identical content.
Existing approaches include on-demand streaming of a VM image, which includes copy-on-read (CoR), copy-on-write (CoW), and adaptive pre-fetching. Such approaches, however, do not exploit image similarity. Existing approaches can also include the use of a de-duplicated VM image repository. Such approaches attempt to exploit image similarity to combat image sprawl, but lack run-time support (that is, retrieving an image requires reconstituting and copying the entire image).
Other approaches include a general de-duplicated file system, which attempts to exploit file content similarity to reduce disk space occupation, but requires replacing existing file systems. Also, such approaches only consider de-duplicating block allocation instead of file access. Additionally, existing approaches can include VM memory page/cache sharing. Such approaches attempt to discover and share identical memory pages by content scanning or exchanging page information, but introduce high overhead costs.