This application relates to digital data processing and, more particularly, methods and apparatus for improved backup storage.
Use of high-performance, high-resolution graphics has grown significantly over the last several years, with indications that it will continue to grow through the next decade. Fueling user demand has been the introduction of lower cost 32-bit work-stations and an increase in the base of applications software available for those systems. Because of their computational and graphics power, these workstations are employed in data-intensive applications such as electronic publishing, computer-aided design and scientific research.
Paralleling these developments has been the emergence of industry standard communication protocols which permit users to operate in a multi-vendor environment. Of particular interest is the Network File System (NFS) protocol, developed by Sun Microsystems, allowing users to share files across a local area network such as Ethernet.
The data storage requirements of professional workgroups employing networked workstations can measure in the billions of bytes (gigabytes). Much of this results from users generating and accumulating large file sets which have aged, but which are considered valuable for future use.
Potential costs associated with the online storage of such data accumulations force system administrators to periodically remove inactive data from the online file server environment, archiving it to magtape or an equivalent medium. This "solution" poses its own problems. First, the selection of information to be archived is often arbitrarily based solely on the amount of disk space allocated to a user. Secondly, in many computing systems, e.g., those running under the UNIX operating system, no ready means is provided for cataloging information stored offline, requiring users to manually track their own archived files.
Related problems occur in the "backing up" of data--that is, the storing copies of computer files (typically on off-line media) that can be recovered in the event of data loss. Conventional prior backup systems use two types of backups, full and incremental. During a full backup, the system transfers copies all the data on the computer (or network) to a set of one or more backup volumes, e.g., magnetic tapes. During later incremental backups, copies all of the files that have changed since either the last full backup or the most recent incremental backup are transferred to those volumes.
Because they are complete copies of all data, full backups can be very time consuming. Even assuming a high-performance backup device capable 1 MB/sec transfers, and software capable of driving such a device at its rated capacity, it takes 25 hours to back up a relatively modest 100 GB of data. The times required to Perform a full back-up of terabyte and larger storage are prohibitively long for normal use.
Incremental backups are generally much smaller than full backups and, hence, can be performed much faster. As a disadvantage, however, restoring lost data generally takes longer as the number of incremental backups increases, since several backups volumes--the original full backup volume and all subsequent incremental volumes--must be processed in order to recover current files.
One attempt to automate incremental backup procedures is described in Hume, "The File Motel-- An Incremental Backup System for UNIX," Summer USENIX '88, pp. 61-72. That system is understood to utilize batch jobs to send copies of modified file copies to a central backup system, which stores those copies, e.g., on a write-once-read-many (WORM) optical disk, along with internally generated file names. A database links the original filename and modification times to the backup copy name.
The Hume system is understood to suffer a number of drawbacks. Among these, the difficulty in knowing a file's "true" name (e.g., where a portion of that name includes a symbolic link) and, consequently, in discerning which backed up file to recover. In addition, there exists the risk that a file will not get backed up until a day or two after it has been modified, as a result of a network or machine failure. Still further, lacking a full backup facility, the capability of recovering any particular file under the Hume system requires that the user have specifically selected that file for backup.
In view of these and other problems present by prior art backup systems, the object of this invention is to provide digital data processing apparatus and methods with improved backup storage.
Another object of the invention is to provide a backup system that reduces the time and cost associated with convention full/incremental backup schemes.
A further object of the invention is to provide a backup mechanism amenable for use in conjunction with hierarchical or mass storage servers and networks.