The present invention relates to dividing address space into sub-volumes and content affinity between the sub-volumes. More specifically, the invention relates to computing content affinity between sub-volumes, and using this computed content affinity for data placement.
De-duplication reduces the number of data storage devices that need to be used to store a given amount of information. It operates by detecting repetition of identical chunks of data, and in some instances replacing a repeated copy with a reference to another copy of the same content. De-duplication system also provides for reconstructing the original form of a given piece of content which has been stored in a compressed manner. References are used to locate the original copies of the data so that the full-length form of the desired content can be delivered.
Systems employing de-duplication can experience performance issues when applied to large-scale storage systems. To resolve this issue, systems built for large-scale storage are generally designed to adopt a scale-out strategy such that separate hardware can operate independently on separate sub regions of the storage. Operating independently is necessary so that messaging overheads, lock delays, and blocking waits do not grow too large. However, de-duplication imposes a limitation for a dependent operation across its entire span. This limitation can create blocking delays that can degrade scalability.