1. Field of the Invention
The invention relates to systems and methods for backing up data. Specifically, the invention relates to systems and methods to preserve memory resources during backup to facilitate large scale data backup.
2. Description of the Related Art
Recent advances in disk storage have made it possible to store increasingly large numbers of files on a computer at minimal expense. As a result, simplistic data management systems, while adequate to manage and protect smaller quantities of data, may fall short where large scale data management is required.
Traditionally, for example, a file attribute bit, or archive bit, has been used to indicate whether a local file has undergone a data change since a previous data management operation. The archive bit, however, is vulnerable to corruption by other user processes, thereby compromising its reliability. Moreover, the archive bit fails to take into account server conditions that may require a local file to be backed up, such as damage to or deletion of a backup file.
In response to these shortcomings, modern data management systems have implemented incremental backup systems utilizing complex file attribute information to identify and differentiate between various types of data changes on the local system, as well as on the server. Incremental backup methods effectively reduce an amount of data sent to the server for backup and therefore save both network bandwidth and server storage space.
Tivoli Storage Manager® data management system, for example, protects an organization's data by storing file attribute information in a central repository. File attribute information may include, for example, update and creation time, date, size, access control lists (“ACL”), and extended information such as mode information, sizes and checksums of relative data streams, and the like. A storage management client application scans the local file system to generate a list of file names and their associated attributes, and then compares the list with the list stored in the central repository. This comparison identifies: (1) new files present on the local file system that are not present in the central repository; (2) deleted files present in the central repository that are not present on the local file system; and (3) changed files having a different set of attributes in the local file system than in the central repository.
While this information effectively streamlines data management operations and increases back up process reliability, it can also require huge amounts of memory and time. Typically, in fact, many gigabytes of memory are needed to represent files in a local or central repository file list. For large scale data backup, the amount of memory needed to accomplish a comparison of file lists may easily exceed the amount of real or virtual memory available for such an operation. Moreover, the amount of time required to scan for files stored locally and in the central repository to create file lists for comparison can exceed available time.
Other prior art data management systems have attempted solutions to these problems by, for example, breaking up logical file systems into smaller logical file systems, extending the amount of virtual memory available, processing entries from a server one directory at a time, and/or journaling changes to data on the local system. Each such system, however, suffers from individual shortcomings. Particularly, breaking up logical file systems into multiple logical file systems may be unattractive to customers that inherit large file systems due to server or information technology consolidation processes. Extending an amount of virtual memory available only postpones the problem of insufficient memory. Processing entries from a server one directory at a time may nevertheless deplete memory and time resources where large quanitites of files are stored within a single directory. Journaling systems are not compatible with all client platforms, and may be unreliable, requiring reconciliation with a central repository to ensure their accuracy.