Disaster recovery (DR) refers to processes, policies and procedures related to preparing for recovery and continuation of technology infrastructure critical to an organization after a natural or human-induced disaster.
Data synchronization is an automated action intended to make replicated data consistent with each other and up-to-date. Existing data synchronization techniques are primarily host-based or appliance-based. Data synchronization finds significance in applications such as high-availability clusters, disaster recovery and mobile computing usage. By way of example, consider a situation where a user must manually copy files from one machine to another machine, which typically includes copying entire files or directories rather than just the most recent changes.
Accordingly, goals of data synchronization techniques include keeping data updated in multiple replicas so that time required to restore a setup after a disaster or failure is minimized. Typically, data replication is achieved through storage level replication in which disk (or storage medium) contents are kept identical through incremental synchronization. There are, however, issues with storage level replication, such as, for example:                Storage level replication is typically done at a coarse time granularity and not attempted in real-time because it tends to be costly in terms of computing resources; and        Storage level replication cannot help in replication of service instances because it only replicates storage contents but not memory contents (service state). Service replication requires both disk and memory contents to be replicated.        
Network based replication, on the other hand, replicates inbound network traffic destined to a server. In modern data centers, almost all modifications to disk contents (or the generation of new content on the disk) of a server take place through input network traffic in the form of telnet or ssh sessions, or connections to specific applications running on the server. Therefore, replicating inbound network traffic to a server should ensure both disk replication as well as memory (state) replication. These two, in turn, can help in realizing service level replication.
The goals of connection level network traffic replication may not be limited to keeping multiple replicas in sync for disaster recovery or high availability. This may also be useful for other scenarios in enterprise data centers. A typical data center scenario can include multiple replicas in different environments, which may provide a challenge in keeping all of the replicas in sync at all times. For instance, there can be multiple instances of the same multi-tier application: one in a production environment, another one in a test environment or staging environment, and yet another one at the DR site. A test environment may make use of synthetic workloads to drive the load, but those do not appropriately represent production workload. As a result, most production performance problems cannot be recreated in the test environment. Replication of one or more network flows arriving at the production environment to the test environment can help drive the application load with production workload. This, in turn, can help capture the production request mix and recreate production problems in the test environment.
Existing approaches to network traffic replication using port mirroring or switched port analyzer (SPAN) cannot be used because they require the intended destination of replicated packets to be directly connected to the network switch at which replication takes place, as the replicated packets cannot be routed using regular routing protocols (because they have duplicate layer-2 and layer-3 addresses). In any enterprise data center, a test environment will typically never be connected to the same switch as the production environment. Furthermore, traditional port mirroring does not provide flow level granularity or connection management to maintain state. Traditional port mirroring replicates all incoming or outgoing packets at a particular port.
On the other hand, service or session fail-over is typically handled in an application-specific manner which detects the failure of a primary instance and redirects new connections to the failed-over instance. However, existing connections get dropped and are not maintained, and this can lead to loss of state and/or down-time.