The invention relates to systems and methods for replicating distributed data over a network.
The advent of client-server technology has led to a widespread sharing and communication of data over one or more computer networks, including local area networks and wide area networks such as the Internet. In client-server systems, users perform processing in connection with data and programs that may be stored in the network's mass storage systems through network-attached personal computers and workstations. The personal computers/workstations, operating as clients, download the data and programs from the network mass storage systems for processing and upload the resulting data to the network mass storage systems. The ubiquity of client applications has led to an explosion of data that needs to be stored.
To meet growing storage demand, new storage architectures have been developed: Network Attached Storage (NAS) and Storage Area Network (SAN). In a NAS, intelligent storage devices connect directly to the network and are dedicated to file serving in a client/server relationship. The NAS device has its own processor with an operating system or micro kernel, and processes file I/O protocols, such as NFS, to manage the transfer of data between itself and its clients. To applications running on the network, the NAS device appears to be a server. To a client, the NAS device is a large file system.
In an SAN environment, the function of storage is detached from network servers and centralized and managed as a separate network resource. SANs that are based on Fibre Channel buses have recently emerged as one of the high performance data communications environments available today to interconnect servers and storage. Running at Gigabit speeds and built on open standards, SANs offer better scalability, fault recovery and general manageability than conventional client-server LAN based approaches.
Data stored in the data storage devices needs to be backed-up to protect against data corruption and to permit the recovery of the data exactly as they existed at a particular point in time in the event of system failure or inadvertent loss of data. The data is typically automatically backed up on a daily or other periodic basis and is stored on either tape or optical archive media. However, during a data back-up operation, the data being backed-up may be accessed by an application and changed.
Since denying access to data during the back-up operation is normally unacceptable, the file server typically captures a snapshot of changed data during the back-up operation. During the snapshot operation, the file server intercepts write operations to the data while backing-up the unchanged data before allowing the write operations to modify the data. During the copy operation, the file server reads each data block to be copied and writes it to the target storage device. Snapshot data bytes are inserted in their proper locations to maintain data coherency. The read/write operation requires the data to travel across a storage channel twice. As such, significant file server processor and memory resources and SAN bandwidth are used during the data copy operation.