Replication of virtual machine (VM) images requires exact matching of chunks or blocks of data. For example, virtual machine images and compressed binary data require evaluating byte level commonalities for unknown content types. Collision resistant cryptographic hashes such as Message Digest 5 (MD5), Secure Hashing Algorithm 1 (SHA1), and Secure Hashing Algorithm 2 (SHA2) may be used for exact matching. Once all required blocks are available on a target node, the image can be reconstituted. Alternatively, the blocks may be streamed or fetched on demand. Content introspection can produce a non-bit-for-bit copy of the file system thus reproducing syntactically same images.
Lossy compression is commonly used to compress multimedia data (audio, video, and images), especially in applications such as streaming media and internet telephony. By contrast, lossless compression is typically required for text and data files, such as bank records, text articles and virtual machine images.
Yet another form of replication of content may use temporal staleness where replication is done with reduced frequency based on application-specific precision metrics. This approximate replication may be useful for applications that can work with slightly stale data for insulating a backend storage service from excessive load, for example, high read rates with very high fan-out.
Replaying or recreating virtual machines (VMs) or containers using build files is a technique to support computational reproducibility. Since virtual machine files are large, e.g., multiple gigabytes, especially if they include raw data files, the scripts and code may be stored in public repositories separately from the virtual machine, so others can examine and extend the analysis more easily and in turn generate the images.
Existing methods use exact replication mechanisms for VM images across data centers. Exact replication may be infeasible in many situations due to potentially high communication costs incurred or the high rate of churn of virtual machine images, for example in development/operations (DevOps) environments.