Cloning data can be an efficient way to duplicate a file, disk volume, virtual machine, operating environment, or other type of stored data source. A clone of a data source is an exact copy of the entity at a particular time. Cloning a data source, however, may not be an efficient use of storage if each clone is allocated the same amount storage space that is required by the original entity.
In a multi-user computer network, such as an enterprise network or a cloud-computing platform, multiple users may be provided with a similar or identical software environment, database, or other software entity. Users of a transaction-processing system, for example, may each be given access to an identical instance of a standardized transaction database, users of a cloud-computing service may each work within an identically configured virtual machine, and users associated with a particular security level may each be allowed access to a standardized set of resources associated with that particular level.
One way to simplify the provisioning and management of such computing environments is to use cloning to quickly duplicate a known a file, disk volume, virtual machine, operating environment, or other type of stored data source that will be provided to multiple users.
In one example, a cloning methodology may provide a standardized virtualized operating environment to every user of a cloud-computing platform. Here, a cloning tool may be used to quickly create an exact copy—or “clone”—of the original standardized environment. This resulting “parent” clone is a fixed snapshot from which an identical “child” clone copy may be created for each user added to the cloud. Although the cloned child environments associated with the new users are initially identical, each user may subsequently alter or customize his or her cloned environment in any way allowed by a system administrator.
One advantage of known cloning methodologies is that, because all clones are identical, there may be no need to initially allocate distinct areas of physical storage to each clone. If, for example, a cloned parent requires 1 GB of storage space, each cloned child may initially point to the data stored in the identical cloned parent, rather than requiring an additional 1 GB of its own preallocated physical storage. In this way, many child clones may share the same area of physical storage space and the same data stored in that storage space.
This efficiency, however, lasts only until a user updates the data stored in his or her cloned child. Because each update may create unique data that differs from corresponding data comprised by the parent clone, this unique, updated data must be stored in a distinct, previously unallocated, physical storage location. Thus, every time a user updates or otherwise alters the original child cloned image, additional physical storage must be allocated on demand to store the updated or altered information.
One problem created by this method of allocating physical storage on demand is that on-demand allocation may prevent an administrator from accurately estimating how much physical storage to preallocate to a newly cloned child. In the above example, a user who repeatedly updates only a small portion of his or her cloned image may over time require a relatively small amount of additional physical storage space. But a user who routinely makes sweeping changes to a large portion of his or her cloned image may eventually require nearly the entire 1 GB of additional physical storage. This unpredictability may hamper an administrator's attempts to efficiently manage storage resources.
Another problem is that an on-demand method of physical storage allocation generally increases the difficulty of allocating storage that is contiguous or that resides on a single volume. Such a problem may occur when a user creates a need for additional physical storage by updating his or her cloned image, but no physical storage is available that is contiguous with storage space already allocated to the user's clone, or that resides on the same volume as physical storage space already allocated to the user's clone. Because data scattered across multiple, noncontiguous areas of physical storage medium, or stored on multiple volumes, may take longer to access, this problem can adversely affect system performance.