1. Technical Field
The present invention relates generally to digital data processing systems, and, in particular, to methods and apparatus in digital data processing systems having on-line file systems for archiving the contents of one or more user volumes stored in those file systems.
2. Background Art
In general, in the descriptions that follow, we will italicize the first occurrence of each special term of art which should be familiar to those skilled in the art of digital data processing systems. In addition, when we first introduce a term that we believe to be new or that we will use in a context that we believe to be new, we will bold the term and provide the definition that we intend to apply to that term. Since our invention is specifically intended for use in digital data processing systems, we will often use terms that are well known to those skilled in this particular art. For example, with respect to an individual element of information stored at a particular address in a memory component of such a system, we will typically use the term pointer to refer, not to the element per se, but to a separate and distinct element that contains the address of the referenced element. Other common terms of art that we may use include: b for bit and B for byte, msb for most significant bit and lsb for least significant bit; and MB for megabyte. From time to time, we may refer to various commercial software products whose names may be either common law or registered trademarks of the respective owners thereof.
There are a number of products in the market which provide some form of file system protection for different kinds of failures. In general, failures can be categorized as either unrecoverable or recoverable. For example, a catastrophic failure is a failure in which the contents of one or more of the files comprising a volume or partition stored in the file system are corrupted either irretrievably or beyond the point at which it's cost effective to try to recover the contents of the disk media. In a catastrophic failure the system (or disk) is no longer operable. Such a failure can be due to hardware causes, such as a crashed disk head, or due to software causes, such as a bad OS build or virus run rampant. In contrast, a recoverable failure is a failure in which the contents of the file system are corrupted or modified incorrectly in a manner that the system (or disk) is still functional and critical portions of the file system are still intact. The most common example of a recoverable failure is when a user inadvertently deletes a file still of value.
Protection approaches protect either the current state of the file system or protect a history of the file system over time through a series of archival snapshots taken under the control of the system operators.
Approaches that protect the current state of the file system effectively provide protection against hardware failures, but nothing more. They include:                File system mirroring whereby a file system exists on two physically distinct disks or disk sets, either co-located or in geographically distinct locations. In a mirrored system, each file system write operation is done on both sets identically. If one disk or disk set fails, the other provides continuity until the failed set can be replaced and mirroring resumes.                    Mirroring protects the current state of the file system from recoverable and catastrophic failures due to physical hardware failures, most notably disk crashes.            It provides no protection against failures due to software problems.            It provides no archiving of older copies of objects in the file system.                        RAID disk systems interleave a file system across two or more disks operating in parallel. RAID provides higher throughput but can also provide protection against the failure of one or more disks in a manner similar to but more efficient than mirroring when a large number of disks are used. The kind of protection it provides is equivalent to that of mirroring.        
Protection schemes that provide archival of snapshots operate by allowing the operator to make an image of the entire file system either upon demand or at intervals scheduled manually or automatically.
Typical schemes allow for either manual initiation of a backup or automatic initiation at a regular interval, such as once per day. In the case of a tape archive, backups are available until old tapes are recycled for use making new backups. In the case of disk, snapshots are available until they are deleted to make space for new snapshots.
To create a snapshot, the backup system copies the contents of the protected file system to archival media, either tape or another disk. At the time of each backup, the backup system can either copy the entire file system (a full save), or just those files which have changed since the last backup (incremental save), allowing a snapshot to maintain an accurate representation of the protected file system at some point in time. Some backup systems provide an explicit mechanism for keeping multiple snapshots available on a single disk media to allow easy recovery of more than a single version of a file.
Most backup systems operate by allowing the backup to take place while the system is in operation. While such a capability is favorable, it results in an imprecise image of the file system since the file system may be changing even as the backup takes place, leading not to a precise snapshot, but to an approximate snapshot taken over a potentially large interval of time, perhaps as much as several hours. This is potentially problematic, given the rapid pace at which file system contents can change. Because of this it is not always possible to restore the file system to a consistent state, even from a good snapshot, since the snapshot is a fuzzy representation of the file system captured over a relatively long period of time.
One approach to producing an exact and guaranteed usable image of a file system is to take the file system (or entire computer) offline, rendering the contents of the file system inert. Once this is a done a snapshot can be taken safe in the knowledge that the file system will remain unchanged during the entire interval required for the snapshot. We call this an offline snapshot. A significant downside to this approach is its intrusive nature, since the file system, perhaps the entire computer, is not available for use during a potentially large period of time.
With mirroring, one can take one of the two or more mirrored volumes offline and get the best of both worlds, but this is not practical for small system users.
Another approach is to use a copy-on-write method to allow taking what is effectively an offline snapshot while allowing the file system to remain accessible for use. If any files are changed while the file system snapshot is being recorded, two copies of the changed files are kept, the original or archival copy being used by the snapshot process, and the modified or live copy, used by everything else. When the snapshot concludes, the archival copy of all files modified during the snapshot are removed, leaving only the live copies.
Yet another approach is to automatically create a new snapshot or update an existing one on a very frequent basis, such as every hour or perhaps each and every time a file system object is written. We refer to this approach as continuous. A continuous approach comes closer to achieving the precision of an offline snapshot while leaving the file system available for use. Unlike the copy-on-write approach, the continuous approach allows copying data from the file system to another storage medium as it changes, reducing the window of vulnerability should something happen to the file system in between potentially infrequent snapshots. A significant problem with the continuous approach is the large storage required. As with the other approaches the storage space required grows linearly over time, since in the typical case about the same number of changes are made to the file system over any reasonable period of time. Unlike the other approaches, the amount of storage required could be dramatically greater, since we are snapshoting the file system not once per day or once per week, but potentially at each and every file system object update.
Examples of prior art in the general field of file system archiving include: Unix, Microsoft Windows backup, TLM 6.1, LiveBackup, CMS, Echo, Disklmage, NortonGhost, Dantz Retrospect, and CVS.
We submit that what is needed is a more efficient method and apparatus for efficiently archiving the contents of an on-line file system, and, in particular, wherein modified versions of existing files are archived essentially in real-time, but older, previously-archived versions of such modified files are selectively discarded.