A Merkle tree is a tree data structure where every node is referenced by a hash. The hash is created by hashing the contents of the node, including any children the node may have. At each node, the hash is dependent on all direct descendent hashes (children). FIG. 1 illustrates an example of a Merkle tree structure, as presently known. As shown in diagram 100, a top hash level has children Hash_0 and Hash_1, which respectively have children Hash_0-0 and Hash_0-1, and Hash_1-0 and Hash_1-1. Hashes at each level are dependent on the hash value or values below. If any hash changes then the parent hash will change and these changes are percolated up to the top of the tree. Changes to the hash value of a node may occur due to a change in the data at the node or the addition/deletion/change to any of its child nodes. Conversely, if a hash at a certain level has not changed, this indicates that no hash or data below has changed either. This represents the power of Merkle trees, namely they provide an extremely efficient method of determining if large sets of data has not changed. The properties of the Merkle tree thus provide very efficient storage of a system changing over some period of time.
Merkle trees are generally built from the bottom-up, i.e., hash values of children determine the hash of a parent and so on up the tree. However, because of this a Merkle tree has only downward pointing references because a node at one level has a value that is only dependent on itself and its immediate child nodes. The lack of any upward pointing references limits its use in many applications. For example, in the Merkle tree implementation, it is necessary to query all data in the Merkle tree to find a parent that contains the given child's hash since it only has down references. This can be a very expensive query, especially in enterprise-level applications with millions to billions of records. This problem is exacerbated in databases with multiple levels, such as tree-based file systems, since finding the grandparents (or any higher-level parent), would require multiple expensive queries for each level of ancestor.
The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions. EMC, Data Domain, and Data Domain Restorer are trademarks of EMC Corporation.