In data archival and/or backup environments, there is often a need to store data objects such as files within an archival/backup system. Storing such data objects in such systems often uses single instancing to attempt to prevent multiple copies of the same data object being stored in the archival/backup environment.
In some data archival and/or backup systems, large files are split into a number of equal sized units commonly known as segments. In this way, when data is appended to a file which has already been archived/backed-up, a later archival/backup operation need only create segments corresponding to the new data. This solution is particularly useful for files such as MSOutlook™.pst files which may have many kilobytes or megabytes appended in a single day if a user receives or sends a number of large messages in that day. By using such a solution, an amount of data for archival or backup can be substantially reduced by avoiding making archival/backup copies of previously archived/backed-up data.
When a file or segment is identified as a possible backup candidate, that file or segment is typically sent to a backup server for storage into the archival/backup system. Conventional systems typically provide this functionality by assembling a package of data for backup at a backup agent and sending that data package in bulk to a central backup server. This approach, whilst much better than simply sending everything to the backup server in an unmanaged way, typically results in 10-12% of all new data objects created within the network being sent for backup.
The present invention has been made, at least in part, in consideration of drawbacks and limitations of such conventional systems.