In a networked environment of nodes (e.g., servers, data centers, etc.), data may be replicated on multiple nodes in order to support data migration and disaster recovery (e.g., failover and failback operations). Under many different circumstances, data stored at different nodes may need to be compared in order to support replication, replication or disaster recovery. A popular technique for comparing data involves fingerprinting. Fingerprinting refers to a technique where a fingerprinting algorithm is performed on data to map the data into a shorter fingerprint (e.g., bit string) that identifies the data. Multiple pieces of data may be compared by first generating fingerprints for those multiple pieces of data and then comparing their fingerprints.
While fingerprinting allows for pieces of data to be compared without having to compare each individual segment making up a piece of data, fingerprinting is still resource and computation intensive. For example, where a piece of data is very large, performing a fingerprinting algorithm on the piece of data may require numerous computations. Likewise, when several pieces of data are being compared at once, the fingerprinting and comparison process may utilize significant resources and take an extensive amount of time to complete. Additionally, fingerprinting cannot guarantee non-collisions (i.e., malicious generation of different pieces of data with same fingerprint), which makes this technique subject to security attacks.
Therefore, there is a need for an improved approach to uniquely identify data in a networked environment for purposes of data replication, data migration and disaster recovery.