The solution according to one or more embodiments of the present invention generally relates to the data-processing field. More specifically, this solution relates to the management of software images.
The management of software images is a key activity in modern data-processing centers. Generally speaking, a software image is a structure that encapsulates files residing on a (physical or virtual) data-processing machine—for example, storing its operating system, application programs, and/or data. Each data-processing center may then be simply seen as a set of portable software images. The software images are suitable to be moved, copied, replicated, protected, and profiled in a very simple way; as a result, the efficiency of the data-processing center may be strongly increased. These advantages are clearly perceived when the software images are used in virtual machines (i.e., emulations by software of physical machines); indeed, in this case any kind of virtual machine may be provisioned on-demand by simply creating a new virtual machine and then booting it on a desired software image (also referred to as a virtual image in this case). For example, this is particularly useful in cloud computing (wherein multiple data-processing services are provided to client computers being completely agnostic of their physical implementation).
However, the management of the software images may be challenging, especially in large data-processing centers with an image repository providing a centralized access to a very high number of software images (up to several thousands).
For example, a problem that may be suffered when the number of software images increases is their resource consumption of storage devices of the data-processing center (for example, hard-disks) where they are stored. In order to tackle this problem, U.S. Patent Publication 2006/0155735 proposes splitting the software images into segments, which are stored only once in the image repository (so as to avoid the duplication of equal segments in different software images); for this purpose, each software image is represented by a vector pointing to its segments in the order in which they appear in the software image.
Another problem may be due to a latency of the image repository. Indeed, the files of each software image are typically stored within the storage devices in blocks being individually accessible—for example, sectors of a hard disk. However, an access time to each block within the hard disk is relatively high (as compared with its processing time). In order to cope with this problem, pre-fetching techniques are commonly used; in this case, whenever a block is accessed, a set of next blocks is read at the same time from the hard disk and stored into a cache memory, so as to be readily available if requested shortly afterwards.
In any case, the blocks of each file of the software image are generally not contiguous one to another within the hard disk; particularly, the blocks storing an actual content of the file are typically mixed with the blocks of other files (since their position within the hard-disk depends on a corresponding writing time). Therefore, the access in succession to blocks of the software image that are not contiguous strongly degrades their access time. Indeed, due to the mechanical nature of a rotating disk and a moving head of the hard-disk, this increases either the time required by the head to reach a concentric track of the disk storing a next block or the time required by the next block within the track to reach the head; moreover, the cache memory does not work properly, since the pre-fetched blocks may be useless.
In order to alleviate this problem, it might be possible to defragment the hard disk by applying standard tools thereto. In this way, the hard disk would be re-organized by compacting the blocks of each file.
However, this technique is completely ineffective in coping with the latency that is suffered when service information is required to access the files (since it is typically stored in a reserved portion of the hard-disk); moreover, the same applies when different files are accessed in succession (since the corresponding blocks are generally not contiguous within the hard disk).