Computing systems store and manage hierarchies of content units or data items. Each content unit in such a hierarchy or storage space has a path or location. The paths are all in a same namespace, and the content units can be accessed by referring to their paths. Often, portions of one storage space need to be copied to another storage space. That is, a set of content units at respective paths in a source storage space may be need to be copied to a target storage space sharing the same namespace. For example, a directory in a remote filesystem may need to be copied to a directory in a local filesystem.
Depending on the application or type of storage, same instances of a content unit might be stored in multiple paths in a storage space. In the case of filesystem files, there may be multiple files containing the same content but at different full paths in the file system. For instance, the same file content “X” might be stored at “\A\B” and at “\A\C\ D”. Files “B” and “D” store the same content, but at different locations and with different file names.
When duplicating a set of paths and respective content units from a source storage space to a target storage space, the duplication of content units may have inefficiencies. The same content may be transferred from source to target multiple times. Referring to the example above, it would be redundant wasteful to copy both files “B” and “D” from source to target when they contain the same content. As only the inventors have observed, another inefficiency may occur when content units of a portion of source storage space to be transferred to target storage space already exist at the target storage space. For instance, if content units at “\A\B” and “\A\C” are to be transferred perhaps as a package, if the target storage space already contains the same content of say “\A\B”, yet at a different local location such as “\D\E\F”, the inventors have observed that transferring the content unit at “\A\B” is potentially avoidable since the same content is already available in the target storage space (at “\D\E\F”).
Discussed below are techniques related to using a shallow cache to efficiently transfer packages or sets of content units or data items from source storage spaces to target storage spaces by leverage existing content at the target storage spaces.