The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for optimizing migration/copy of de-duplicated data.
In one illustrative embodiment, a method, in a data processing system, is provided for optimizing migration/copying of de-duplicated data from an internal storage system to a removable storage system. The illustrative embodiment determines a preliminary number of clusters to be generated for sets of data objects stored on the internal storage system based on a number of the sets of data objects. The illustrative embodiment generates the preliminary number of clusters based on shortest distances between the sets of data objects. In the illustrative embodiment, each cluster comprises one or more sets of data objects and each set of data objects comprises one or more chunks of data. The illustrative embodiment identifies a chosen cluster from a set of clusters by identifying a cluster having a greatest number of common chunks within as few sets of data objects as possible. The illustrative embodiment determines whether an export-size of the chosen cluster exceeds an available storage capacity of the removable storage system. The illustrative embodiment exports the chosen cluster to the removable storage system in response to the export-size of the chosen cluster failing to exceed the available storage capacity of the removable storage system.
In other illustrative embodiments, a computer program product comprising a computer useable or readable medium having a computer readable program is provided. The computer readable program, when executed on a computing device, causes the computing device to perform various ones of, and combinations of the operations outlined above with regard to the method illustrative embodiment.
In yet another illustrative embodiment, a system/apparatus is provided. The system/apparatus may comprise one or more processors and a memory coupled to the one or more processors. The memory may comprise instructions which, when executed by the one or more processors, cause the one or more processors to perform various ones of, and combinations of, the operations outlined above with regard to the method illustrative embodiment.
These and other features and advantages of the present invention will be described in or will become apparent to those of ordinary skill in the art in view of, the following detailed description of the example embodiments of the present invention.