The need for the backup of computers is well known. Simply stated, computers have hard disks containing much valuable data. Unfortunately, no such storage technology is completely free from the possibility of failure, nor is any immune to loss by theft, fire, or natural disaster. In the case of such failure or loss, all data unique to the computer hard disk is generally permanently lost. The goal of any backup system is to maintain the controlled redundancy of data to avoid and minimize the loss of unique data.
In current practice, computer backup systems are of two distinct types: continuous backup and batch-oriented backup. Continuous backup systems copy data simultaneously and continuously while the data is being created or modified. Batch-oriented backup systems operate at distinct points in time to copy all selected data at that point in time. Each of these types has advantages and disadvantages, and some installations make use of both approaches simultaneously. This invention pertains to batch-oriented backup only.
Batch-oriented backup systems may be further subdivided into two types: local backup and network backup. Network backup systems share the archive storage media among more than one computer, while local backup systems do not.
It should be understood that while the case of the network computer backup is used in these descriptions, this invention is equally applicable to the simpler case of a local backup system. Also, the computer memory is referred to here specifically as a hard disk, but a different memory technology such as a semiconductor disk emulator would be equally applicable.
I have developed a backup system for computers. This system is set forth in my U.S. Pat. No. 5,150,473 entitled DATA STORAGE FORMAT FOR ADDRESSABLE OR SEQUENTIAL MEMORY MEDIA, issued Sep. 22, 1992. To summarize, a data archive format is disclosed for archiving computer information taken from computer sessions, each session being of the form that includes a so-called "root directory" with appropriate branches leading to each discrete file or subfile within the computer session. The archive format includes the transfer of data to an archive media member, which archive media member can alternatively be addressable or sequential memory and can be recordable in either a re-writable or write once manner. When data is archived, a separate and resident archived directory is created in the immediate memory of the computer having the data to be archived, the purpose of this separate and resident directory being to maintain high speed during reading and writing of the archived data. This separate and resident archive directory is updated during the archiving process, used to access and retrieve the archived data during the recovery process, and distributed through the archive media in a non-predictive, largely non-redundant but recoverable format. Upon failure of the memory containing the separate and resident archive directory, reconstruction of the separate and resident archive directory is possible.
In current practice, a network computer backup system consists of one or more archive storage devices connected to a computer on the network, and thus shared among a set of computers similarly connected. The computer with the storage devices, known as the backup server, contains a computer program to transfer data from the other computers, known as the clients or sources, to the local storage devices.
In what follows, I offer what seems a simple analysis of backup of the prior art. Unfortunately, backup is a remote field of computer science that not only suffers neglect but additionally has its common problems not set forth with clarity and therefore not understood. It will be understood therefore that in stating the faults of the prior art, I am delineating the problem to be solved. And since discovery of the problem to be solved can constitute invention, I claim invention in being first in setting forth both problem and solution that this invention offers to the computer data backup field.
The prior art exhibits two approaches to sharing the backup computer among the various client computers. The first is to allow the user of each client computer to initiate a backup by contacting the backup computer and requesting to store data. This is known as the "push" model. When the backup computer is busy, the client computer is told to wait, generally in first-come first-served order, until the backup computer becomes available. With this scheme, the client backup requests are almost always clustered at certain times, resulting in delays for everyone. Also, client backups must be copied to the currently available storage media, where the random nature of the client requests results in random distribution of their data on the media. This makes it difficult to restore the data for a single client. Finally, users usually forget to initiate back ups on a regular schedule, especially if it involves delays.
The second sharing approach is the "pull" model, where the backup computer accesses the client computers according to a preset scheduled defined by an administrator. For example, computers are left on at night. The backup system starts up at a certain time, accessing the computers sequentially, and copies their data to storage media. When the computers' users return in the morning, backup has occurred. While the clustering effect and random data distribution of the push model are avoided, a client will not be backed up if it is not available at the given time.
Unfortunately, many computers are now portable. They are only occasionally connected to a network, and thus may never be available at the predetermined time that a "pull" backup would take place. But "push" backups rely on user discipline and have clustering and availability problems. The full magnitude of these issues can be considered by taking the case of the salesman taking orders on a portable computer.
First, the salesman computer is rarely in the office where connection to the network--and hence the backup system can occur.
Secondly, when the salesman is in the office and connected to the network, it is usually at times when all other salesmen are in the office and connected to the network. For example, it is common for salesman to be in the office at the beginning of the day and the end of the day. Salesman are not usually in the office in the middle of the day because they are out selling.
Third, trying to have the salesmen themselves initiate backup is generally not satisfactory. Typically, all salesmen want to back up at the same time. For example, backup among a sales force is common in the late afternoon. Frequently, and rather than wait for backup to finish, salesmen disconnect their respective computers from the network and leave.
Fourth, and considering the case of the organization that has a large number of sales personnel, backups do not statistically spread themselves out. Specifically, when the largest number of salesmen is present, the largest number of back ups are likely to occur.
A final complication to the backup task, ill-handled in the prior art, is that the amount of data and thus the time required to copy it is highly variable among various computers, and usually unknowable in advance of the backup. For example, a backup operation may complete in less than a minute when only a small file has changed since the previous backup, while the subsequent backup of the same disk may find that virtually every file has changed, requiring hours or even days to complete the copying operation. Also, the typical administrator will need to regularly perform a "full backup", where all source data is copied to a set of new or erased media. Because copying such large volumes of data may use significant resources, making it difficult or impossible for users to share the same computer or network, administrators of the traditional "pull" backup will often attempt to back up at night or on weekends when usage is light. But all too often the backups are still running when this "backup window" has ended, adversely affecting the users or requiring the administrator to terminate the backup manually. As the quantity of data storage inevitably increases, the backup duration eventually reaches the point of always exceeding the backup window, where the data copied from the last computers or the last part of the disk drive will then never be backed up, a problem known as starvation.
Starvation is easy to understand. In the typical pull backup, all computers on the network are backed up in the same order in each backup window. Naturally, those computers at the head of the list will be backed up first. Those computers at the end of the list will be backed up last. And where the interval required for a continuous backup exceeds the backup window available, backup will not occur for those computers at the end of the backup list. These computers will be "starved" in the sense that they will never be backed up. And each time backup is initiated, it will occur in the same order on the available computers with the same starvation problem resulting.