1. Field of the Invention
This invention relates to data compression in a computer storage environment. More specifically, the invention relates to management of virtual machines and a data compression tool for images of the virtual machines.
2. Background of the Invention
A virtual machine is a self contained operating environment that behaves as if it is a separate computer, while allowing the sharing of underlying physical machine resources between multiple virtual machines. More specifically, each virtual machine operates as a whole machine, while a host of the virtual machine(s) manages resources to support each virtual machine. One of the challenges associated with employment of multiple virtual machines is that each machine requires storage space. While storage costs are decreasing, storage still remains an expense.
Data compression tools are known to address data storage and to alleviate concerns with storage capacity limitations. More specifically, data compression enables storage devices to store the same amount of data with fewer bits. However, prior art storage techniques do not effectively address the storage needs of the virtual machine environment, or other environments which require archiving of many gigabytes or more of data.
Accordingly, there is a need for a data compression tool that addresses the needs for compressing large quantities of data, such as that known in the virtual machine environment. More specifically, such a compression tool should identify and remove global redundancies within a large input window, without placing an undue burden on memory and data processing requirements. In one embodiment, a global redundancy is present between different virtual machines that are based on the same operating system, wherein the footprint of the files are the same, but only the user data changes between the different versions. In another embodiment, additional virtual machines can be added to an existing archive, benefiting from the global redundancies between the existing archive and the newly added virtual machines. In another embodiment, the differences between a reference data set and a target data set are detected using global redundancies. These differences can be used to create the target data set based on the reference data set, which is especially valuable in networked environments, where instead of transmitting the target data set, only the differences can be transmitted if the receiver already has the reference data set. Accordingly, the data compression tool needs to address the global redundancies by referencing them during the compression process.