1. Field of the Invention
The present invention relates to backup systems for computers. More specifically, the invention is an improved backup system minimizing resource utilization within the computer system, and maximizing recovery potential in the event of information losses.
2. Description of the Related Art
Recently, there have been technological advances in the capacity of data storage devices, such as hard drives, that enable data servers, network workstations, and roaming clients to carry increased amounts of information. Due to these advances, the once relatively simple process of keeping computer data backed up using relatively low cost, removable media has become more complex. An important task of conventional information technology professionals is to manage new backup function requirements in next generation network designs without significant disruption to network users.
Most traditional level backup methods and systems attempt to offer a degree of flexibility in scheduling backup operations to assist the information technology professional in reducing the disruptive effect of various aspects of these backup operations. These conventional products, however, typically provide an increased exposure to the information system in the form of data loss due to failure of a single storage device, such as a tape failure.
Most conventional backup products use what is known as a xe2x80x9ctraditional level backupxe2x80x9d. The levels of backup are typically defined from zero to nine (0-9). For example, a xe2x80x9cLevel 0xe2x80x9d backup is defined as the xe2x80x9cepochxe2x80x9d or full backup; all data contained on a client information system is backed up during a Level 0 backup. Higher level backups operate on the principle that all data changes are backed up since the most recent backup of any lower and/or preceding level. For example, a Level 2 backup obtains and stores data that has been changed or created since the most recent of either the Level 0 or Level 1 backup.
In a typical traditional level scheme, there is a tradeoff between stress on the network and client systems and exposure to storage device failure prior to or during a future restore process. More complex schedules can offer a significantly reduced network load, for example, but can also require more tape read requests to restore data completely. The need for more tapes in the restore process therefore increases the exposure to storage device failure. In a level backup scheme, for the purpose of future restore operations a single bad tape can lead to a substantial amount of lost data. On the other hand, employing fewer backup levels can improve restore time and reliability, but cost is increased in the form of downtime for networks and backup clients during backup operations.
Many conventional backup systems and methods utilize a 2 or 3 level backup scheme. This appears to be a level range where information technology professionals are comfortable with the above-described limitations of a traditional level backup. Level 0 backups are usually run an a weekly, bi-weekly, or monthly schedule, depending on factors such as the amount of data, the available network bandwidth, and the risk associated with potential loss of data. Systems and methods that place a comparatively higher value on data tend to perform a two level backup scheme. Systems and methods that are limited by any combination of factors such as the amount of data, the data change rate, the available backup window, or the rate that backup data can be transferred over the available network bandwidth, tend to use a three level or higher backup scheme.
In an attempt to resolve the inadequacies of a traditional level backup, several conventional systems and methods have evolved:
In a day-to-day backup system, only the data changes that have occurred since the most recent backup are taken from the backup client and written to a storage device. This has the benefit of reducing stress on the backup client system, but the number of storage devices that must be read to restore a data partition can increase dramatically over time. Periodic full backups through the network are required to maintain a reasonably low number of storage devices. In addition, a backup server in this type of system must be able to determine efficiently and exactly what tapes to read and what data to send to the backup client. If an efficient and exact system is not available, an additional burden is placed on the backup client during the restore process, because many copies of the same file may need to be transferred.
Some systems and methods attempt, on a day-to-day basis, to collocate related backup volumes on a single storage device. This approach reduces the tape complexity for restore operations, but it also increases the exposure if a single storage device, such as a single tape, fails. A substantial risk associated with these types of systems and methods is that many days worth of backup data can be irretrievably lost.
One way to reduce the risk of data loss due to storage device failure is to create a copy of all original data storage devices. This may be known as tape mirroring. Many conventional backup systems and methods provide information technology professionals with the ability to replicate data easily, but storage device costs are doubled if the data is fully duplicated. Network and backup client loads in this model remain dictated by the level backup scheme that is chosen.
A method and system are therefore needed to manage backup operations and to minimize the adverse impact of these backup operations on data networks, backup clients and/or other network users. What is further needed are a method and system that can reduce the risk of data loss due to storage device failure. A flexibility of design is required that permits an information technology professional to re-define a backup plan, achieve a higher level of reliability with backup operations, and possibly reduce the cost of storage devices, such as tapes.
The present invention is an improved backup system, maximizing the ability to recover data in the event of information loss, while minimizing utilization of computer system resources.
The backup system begins by sending a time stamp of zero to the computer system. At time zero, all files, directories, and meta information presently existing on the computer system are transferred to the backup system. Additionally, the client computer system sends the present time to the backup server, which becomes the parent time for the next successive backup operation. At the next backup time interval, the time stamp representing the previous backup is sent from the backup system to the computer system, prompting the computer system to send all files that have been modified since the last backup to the backup system. All metadata within the computer system is also sent. If any file is unable to be sent, for example, a file that has been opened by a user, the identity of that file is also sent to the backup server. The computer system sends the files to the backup server in a file stream comprising a sequence of files with metadata headers and footers defining the identity and boundaries of each file. With unique identifiers for every file within the file stream, the files may be transmitted in any order, and a separate file containing the location of files within the data stream is not required. This file stream is directed towards the cache of the backup system.
Once all transmitted files are within the cache, the backup system checks the transmitted metadata and its records of the previous backup to determine whether there are any files that should have been backed up with the previous backup, but were not sent with the previous backup information, and looks for these files in the cache. If any such files are not within the cache, the backup system requests these files from the computer system. Once these files are received, the backup system now has everything needed to produce a full backup volume.
Once all files modified since the last backup are in the cache, the backup server will sequentially scan the presently existing backup tapes in reverse chronological order until the most recent full backup tape is reached and itself scanned. As each backup tape is being scanned, the file stream from that tape is compared with the files currently in the cache. If the file on tape is an earlier version of a later file in the cache (indicating that the file was updated), or the file on tape is not represented in the present metafile (indicating that the file was deleted), the file on tape is skipped. Otherwise, the file is copied into the cache. Once all tapes have been scanned, the metadata and the previous full backup are checked to ensure that all files are complete. If so, the cache is copied onto a new tape, resulting in a full backup tape. If any files are found not to be complete, the backup system will require a full backup, sending a request to the computer system to transmit all files to the backup system.
In the event of damage to the new full backup tape, the tapes within the backup system used to produce this new tape still exist within the backup system, and can be used to recover the information lost from the new backup tape.
Accordingly, it is an aspect of the present invention to provide a backup system for computer systems wherein the majority of backup operations are performed within the backup server instead of within the computer system.
It is another aspect of the present invention to provide a backup system for computers minimizing the need for full network backups, wherein all files are transmitted from the computer system to the backup system.
It is a further aspect of the present invention to provide a backup system for computers wherein files modified since the last full backup are merged with the files presently existing in the backup system to produce a new full backup tape, or, if desired, one or more mid-level backups at levels desirable to a particular client, without placing the demands on the client""s resources normally required for the desired backup levels.
It is another aspect of the present invention to provide a backup system for computers wherein the file stream transmitted from the computer system to the backup system contains metadata headers and footers for each file transmitted, thereby providing unique identifiers within the file stream for each file, and thereby permitting transmittal of all files in any order, and eliminating the need for a separate file containing the location of files transmitted within the file stream.
It is a further aspect of the present invention to provide a backup system having redundancy to the last backup taken, for all backups except the first full backup, thereby maximizing recovery potential in the event of problems within the backup system.
These and other aspects of the present invention will become apparent through the following description and drawings.