In content management systems, there are different types of data that are stored. One type of data is unstructured data (e.g., text documents, Portable Data Format (PDF) files, images etc.). Such unstructured data items may be referred to as objects, content or content objects. Another type of data is metadata that describes the unstructured data (e.g., content objects).
The metadata may be classified into two categories. One category includes user attributes related to a business. Typically, these user attributes are created by the user or generated by the user's application. For example, in an insurance company, an image of a car accident may have attributes of accident date, accident location, etc. Another category of metadata includes system attributes, which are defined or generated by a content management system. For example, the create date of an image, an identifier of a user who ingested the image, where the image is stored, image size, etc.
Data migration is commonly used to implement Hierarchical Storage Management (HSM) for archive purposes.
Before performing migration or archiving, some readiness checking is done to each of the candidate content objects that may be migrated based on some system defined attributes for data consistency. The checking for the candidate content object may include, but is not limited to: determining whether the candidate content object is currently in the middle of a transaction, whether the candidate content object is still in retention or whether the candidate content object has a hold by a business application. This checking happens during the migration or archiving, which occupies the migration or archiving time window, and results in less time for moving content objects.
In most business systems, content migration is scheduled to happen during off-hours, for example, during nights. When data volume is big enough, migrating all the content objects in several hours is a challenge. In some business systems, there is a requirement that the migration finishes in a small time window.
Because the migration happens in off-hours, there may be no human interference, and there may be no way to get to know how many content objects are to be migrated. There is also no chance for administrators to review the migration candidate content objects before the migration.