Generally, computing devices store data as files in a data storage system. The data storage system stores and indexes the file content for later retrieval. The index is typically represented as a tree-like hierarchy of directories, also sometimes referred to as folders. Each directory represents a grouping of zero or more files and sub-directories. The hierarchy of directories has one root node (the only directory with no parent directory), zero or more intermediate nodes (sub-directories), and zero or more leaf nodes (files and/or directories with no sub-directories). A file hierarchy can be packaged (with or without compression) into an archive file, which resides in a file system like a file but contains files and sub-directories like a directory. Thus an archive can be viewed as a leaf node, as an intermediate node, or as both.
A set of files may be replicated within a data storage system or from one data storage system to another. In some instances, the replicated data is unchanged from copy to copy. In other instances, the replicated data is modified. The modifications may be as simple as a bit or two altered in one file or the modification may be more extensive. Generally, there are three types of modifications: changes to individual file contents, addition or deletion of files (including changes to file names), and addition or deletion of directories (including changes to directory names). However, even when there have been modifications in a replicated file hierarchy, the replica set of files may still have similarities to the original set of files.