As companies today become more accustomed to storing important company information on their data network, the value of these networks and the data they store continues to grow. In fact, many companies now identify the data stored on their computer network as their most valuable corporate asset.
Today most backup systems operate by having the network administrator identify a time of day during which little or no network activity occurs. During this time the network administrator turns the network over to a backup system and the data files stored on the computer network are backed up, file by file, to a long term storage medium, such as a tape backup system. Typically the network administrator will back up once a week, or even once a day, to ensure that the backup files are current.
Although such a backup process may work well to create a copy of the data stored on the network, it is a time consuming and labor intensive process. Moreover, it is a cumbersome process that often is inappropriate in many environments. For example, as more and more computer networks begin to operate twenty-four hours a day seven days week, it is continuously more difficult for the system administrator to identify a block of time during which the majority of network resources may be turned over to the backup procedure. Moreover, as computerized network systems begin to store more information as well as information that changes regularly during the course of the work day, the value of a backup system which only backups once a week or once a day is fairly reduced. In fact many companies now rely on the corporate network to store almost all of its business information, and the loss of even a portion of the information stored on the network during the course of a day may result in a substantial cost for the company. Accordingly, systems which only backup periodically are of a reduced value to a company.
Computer backups are performed using several strategies. The simplest entails a complete transfer of all data and meta-data (such as time stamps, ownership, and access rights) to a target which is simple but redundantly transfers data already present on the target at potential high expense. Incremental backups transferring actual changes or a more manageable subset are also possible. Common mechanisms for determining an appropriate increment include archive bits and modification time stamps. Archive bits are set by the operating system on any change and reset by the backup software but preclude use for multiple backup systems and don't narrow down the types of change. Modification time stamps are set by the operating system but can sometimes be adjusted by user software.
Transfer-mechanisms vary often involve common file formats. Such file formats often intermix data and meta-data which make meta-data separation for other processing expensive. The contents of such files are also often constrained to what's needed for a specific operating system.
Many software programs implementing such backups use only one single thread of instructions which precludes having multiple simultaneous reads and writes which speed backups with increased input/output (I/O) throughput and reduced latency effects from performing multiple IOs simultaneously to/from the same and different end-point. Parallel I/Os to the same end-point can have higher throughput when the operating system and firmware can re-order them for efficiency (as on spinning magnetic and optical disks where they can execute in physical order) or overlap latency with command execution (notably on a network connection) to reduce its effect. Parallel I/Os to different end-points (such as a network target and local storage, or multiple mass storage devices like in a disk array) can approach the aggregate throughput of the devices as opposed to the average when performed serially.
Moreover, although the current backup systems work well for putting data on to a long term storage media system, they often store data sequentially on to media, like a magnetic tape, losing the file structure of the data, and making it difficult to retrieve information without having to reinstall all the data previously stored on the tape. Thus, if a portion of the data is lost, it is often difficult to restore just the data that was lost, and often the system administrator is forced to decide whether it is worth the cost of retrieving the lost portion of the data.
Many backup systems do not provide an easy way to validate that the backup contents are usable (problems such as a disk or tape error can preclude this) and match the source. U.S. Pat. No. 7,644,113 discloses systems and methods for continuous back up of data stored on a computer network. The system includes a synchronization process that replicates selected source data files data stored on the network and to create a corresponding set of replicated data files, called the target data files that are stored on a backup server. This synchronization process builds a baseline data structure of target data files. In parallel to this synchronization process, the system includes a dynamic replication process that includes a plurality of agents, each of which monitors a portion of the source data files to detect and capture, at the byte-level, changes to the source data files. Each agent may record the changes to a respective journal file, and as the dynamic replication process detects that the journal files contain data, the journal files are transferred or copied to the backup server so that the captured changes can be written to the appropriate ones of the target data files.