Hash-based directed acyclic graphs (“HDAGs”) can be used to represent a structured document collection, e.g., a directory structure, of computer data. In particular, HDAGs can be used in representing a structured document collection that is used in archival data storage applications and synchronization applications in a computer apparatus. A property of HDAGs that makes their use in representing a structured document collection, and, in particular, in on-line data storage applications, archival storage applications, and synchronization applications associated with structured document collections, advantageous is the automatic structure-sharing property. The automatic structure-sharing property is the characteristic of HDAGs that identical portions, e.g., subdirectories that contain identical content but are from different filesystems or common portions of different files, where each filesystem or file is represented by a separate HDAG, are not duplicated within each of the HDAGs.
The automatic structure-sharing property that is associated with HDAGs can be illustrated using the following example. First, consider two filesystems, with each of the two filesystems located on separate computers, and each of the two filesystems including an identical subdirectory. The HDAG structures for each of the two filesystems can be built independent of one another and without any communication between the two computers. As long as the same algorithm is used to map each of the two filesystems to an HDAG, the resulting HDAGs (one HDAG for each filesystem), will have an identical and shared substructure that corresponds to the shared subdirectory. This automatic structure-sharing property is advantageous when HDAGs are used in storage and synchronization applications.
Another property of HDAGs that makes their use in representing a structured document collection, and in particular, in archival storage and synchronization applications, advantageous is the self-assembly property. The self-assembly property is the characteristic of HDAGs that given a set of unordered nodes from an HDAG, the parent-child relationships that exist between the nodes can be determined without any additional information.
While HDAGs can be used in archival storage and synchronization applications, the use of HDAGs in these instances can result in data security issues. In particular, HDAGs, when left unencrypted, are potentially vulnerable to snooping when the storage media where the HDAGs are stored, or the communication media through which transfers of the HDAGs are made, are insecure. Hence, there is a need for a system and method that overcomes one or more of the drawbacks that are identified above.