Distributed computational systems such as content management systems manage large amounts of data in storage systems that are typically geographically distributed. These distributed computational systems comprise methods for migrating data from one or more source storage devices to one or more target storage devices. For example, data in the distributed computational systems are migrated in a batch to a new storage system when the source storage system is replaced.
A distributed computational system such as a content management system manages diverse data objects such as files, documents, images, video, audio, etc. Conventional content management systems archive data objects by continually migrating unused data objects from the source storage devices acting as main storage devices to the target storage devices acting as archive storage devices. The source storage devices comprise memory that can be quickly accessed. The target storage devices comprise slower memory such as, for example, an optical disk. The data objects are archived to free up the faster memory and to manage storage of the data objects.
Content management systems typically migrate data for archiving based on a predetermined migration policy. The migration policy is set by a system administrator and comprises a predetermined allowed storage duration in terms of elapsed time during which a data object resides on the main storage device. A resource manager of the content management system monitors data objects on the main storage device. When the amount of time that a data object has resided on the main storage device exceeds the allowed storage duration, the resource manager migrates the data object to the archive data device. Although this technology has proven to be useful, it would be desirable to present additional improvements.
Conventional migration policies for content management systems are based only on allowed storage duration, determined from a date of importation. However, regardless of how often a data object is retrieved, a data object is archived after the data object has resided in the main storage device for a predetermined allowed storage duration such as, for example, 30 days. Retrieval of a data object may be requested after the data object is migrated to the archived data device. Consequently, the resource manager has to retrieve the data object from the archive data device. This transfer of the data object to and from the archive storage device is inefficient and costly in terms of system resources such as bandwidth. Furthermore, a large data object that is likely to be retrieved may be migrated to the archive data device. When required by a user, the large data object is retrieved. This retrieval may take several hours or even days. Consequently, the large data object is not available to the user during the retrieval time and bandwidth involved in the migration of the data object to the archive storage device and retrieval of the data object from the archive storage device is wasted.
Conventional migration policies apply the allowed storage duration to all data objects indiscriminately. Reducing the allowed storage duration increases the number of retrievals of data objects migrated to the archive data device. Increasing the allowed storage duration requires additional storage space in the main storage device, which is typically a more expensive, faster storage space.
What is therefore needed is a system, a computer program product, and an associated method for selecting data objects for migration that considers in the migration policy properties of data objects such as version, data object format, data object size, reference date, data object version, etc. The need for such a solution has heretofore remained unsatisfied.