1. Field of the Invention
The invention relates to computing systems and, more particularly, to file systems.
2. Description of the Related Art
As is well known, file system backups in computing systems may take a considerable amount of time and storage space. In many file systems, a significant portion of the data is not changed after its creation or after an initial period of access. Generally speaking, the conventional approach to data backup includes periodically performing a full backup of everything in the file system, for example once a week or once a month, and performing incremental backups between full backups, for example every day. Typically, this conventional approach makes a copy of all of the data in the file system, even though a large percentage of that data may not have changed since the previous full backup. In order to perform a restore from a previous backup, the most current full backup is typically restored, and then any changed data since the full backup is restored from incremental backups performed subsequent to the current full backup.
While a variety of backup and restore approaches similar to that above are available, such approaches may not meet the needs of a particular enterprise. Given that data may only be backed up on a periodic basis, numerous versions of a given file may have come and gone between backups and may not be recoverable. In addition, performing a conventional restore operation tends to be very time consuming and often results in recovery of far more data than is requested. Consequently, in recent years, alternative approaches to data protection have arisen. In particular, approaches sometimes referred to as “Continuous Data Protection” (CDP), or “continuous backup”, have arisen to meet the needs of enterprises. In addition, versioning file systems have been proposed and developed to address some of the above described concerns.
In contrast to conventional approaches to backup and restore, continuous data protection may typically involve the use of additional hard drive storage to mirror main storage and also keep an up to date record of changes to the data storage in a continuous, and time based, manner. Should data corruption occur, a state of the data immediately prior to the corruption may be identified and used to restore the data. As the approach is generally hard drive based, restoration may be achieved in a relatively quick manner as compared to restoration from tape. In addition, as changes to data are monitored and recorded in a generally real-time manner, all changes or versions a given file may be recoverable. Accordingly, continuous data protection approaches may address some of the perceived problems with conventional backup and restore approaches. It is noted that CDP is typically used to enhance the data protection abilities of a given enterprise, rather than replace traditional backup and restore operations. Consequently, periodic backups of a more conventional type may continue to be performed.
On the other hand, versioning file systems are generally configured to retain earlier versions of files within the file system. By maintaining earlier versions of files, it may be possible to recover data from a previous known good state in the event, of data corruption. As may be appreciated, the granularity of prior versions retained, and the length of time prior versions are retained, may have a significant impact on the amount of data which must be maintained. Consequently, a variety of approaches exist for determining which versions to retain, and how existing data may be pruned to reduce the amount of data maintained.
As may be appreciated, file systems and the amount of data being backup up or otherwise protected may be very large. Consequently, catalogs, indices, and other metadata associated with such data can also be very large. For example, catalogs for some large file systems may exceed 500 GB. For an enterprise utilizing the enhanced capabilities of continuous data protection, or a versioning file system, there may exist certain expectations. One of the advantages of such data protection schemes is the rapidity with which a restore operation may be performed. As the amount of metadata can be very large, and can include very large numbers of objects or entities (e.g., some identifiable object for every change that occurs with respect to the data being protected), management of the metadata itself is an important factor in overall system performance. In addition, the particular approach used for managing the metadata may also impact how much space is required for the metadata.
Accordingly, an effective method and mechanism for retaining earlier versions of data in a file system is desired.