Conventional deduplication is a well-proven method of increasing the functional storage capacity of a system. Conventional deduplication is based on pattern matching. When data matching a pattern is found, it is replaced with a reference to a single version of that data. Data matching may be performed in a number of ways that include matching a whole file/object (i.e., finding identical files), matching bit patterns in fixed block size components of the file/object, and matching bit patterns using fixed block sizes and a sliding window across a file/object.
In all these approaches, deduplication is done by matching bits across two sources (or a source and a library), while being indifferent to the nature of the source. As such, traditional deduplication systems are incapable of recognizing files/objects that are closely related from an end-user information content point of view. For example, traditional deduplication systems are not able to recognize that a plain text object, the compressed version of that object, and the encrypted version of that object all represent the same fundamental data, since all those versions have different bit patterns. Hence, these three copies would not be deduplicated.
The above information disclosed in this Background section is only for enhancement of understanding of the present disclosure, and therefore it may contain information that does not form the prior art that is already known to a person of ordinary skill in the art.