Computers have become essential elements in the day-to-day operation of many enterprises such as, for example, corporations, governments, and educational institutions. Many of these computers operate in cooperation with other computers and data processing devices by way of communication networks. Networked computers routinely accept, process, display and transmit data when running software applications such as, for example, database applications, stock trading applications, computer aided design (CAD) applications, data analysis and modeling applications, and order processing applications. Data residing on networked computers may be essential to the operation of a project or enterprise. Furthermore, the data may be difficult, or impossible, to replace if it becomes lost, damaged, or corrupted.
Enterprises may utilize data backup, or archiving technologies in order to reliably create duplicate data sets for use in the event that a primary, or master, data set becomes corrupted. Prior art data archiving techniques may employ creation of an entire duplicate data set at fixed intervals, for example, daily, weekly, or monthly. The archived data may be written to tape, a separate hard drive, to CD-ROM, etc. When data files become large, performing archives of an entire master data file can take many hours. In addition, the archive can utilize almost one-hundred percent of a network's bandwidth if the archive is saved to a remote device coupled to the network.
When data files change often, such as numerous times throughout a day, maintaining up-to-date data archives may become problematic due to the amount of time and network resources required to archive master data throughout the day. Failure to maintain up-to-date data archives can greatly increase the amount of time necessary to recover from a disaster such as a crashed hard drive, a fire, an act of sabotage, etc.
Prior art techniques may further attempt to archive data in ways other than replicating an entire data set or storage system. An example of an alternative backup technique is referred to as a transaction-based backup. A transaction-based backup involves the transmission of a high level transaction to a remote file server in its entirety. High level transactions are typically specific to a particular application such as a database application. Therefore, use of such an approach can be database-engine specific, and therefore may have to be implemented as part of the database application itself. As a result, this approach can be costly since the particular application may have to be modified and further can require large amounts of network bandwidth since the entire high level transaction is sent to the remote file server.
Another prior art technique is referred to disk mirroring. Disk based mirroring, as the name implies, involves replicating the contents of a disk on a remote device. Implementations of disk based mirroring may replicate, or mirror, actual disk writes, including sector locations and sector data from the primary server to a remote server. Use of disk mirroring requires the use of very high reliability and high bandwidth communications networks as well as requiring identical disk hardware on both the primary server and remote server.
Still another replication technique used in the art is referred to as file based mirroring. File based mirroring involves the replication of files, or portions thereof, from a primary server to a remote server. File based mirroring may include transmission of an entire file or may involve the transmission of file portions in order to conserve network bandwidth. When portions of files are transferred, problems can arise if a transmitted portion is lost, arrives out of order, or becomes corrupted. When a problem arises, prior art techniques may retransmit an entire data file resulting in inefficient use of network bandwidth.
What is needed is a data backup technique for maintaining up-to-date archives on an ongoing basis. In addition, the backup technique should efficiently use network bandwidth and further should not be overly burdensome to the processing capabilities on the master or remote computers. Furthermore, the master computer should send updates to the remote site in substantially real-time, when feasible. In addition, the data backup technique should use data compression and error detection protocols in a manner avoiding the re-transmission of large volumes of data whenever an error occurs.