1. Technical Field
The present invention relates generally to data file storage systems and, more particularly, to a changed files list with time buckets for efficient storage management.
2. Description of the Related Art
Use of electronic data storage for long-term recordkeeping is increasing at an exponential rate. Much of this data is stored in file systems. Moreover, much of this data is write-once and is to be retained for long periods of time.
The most commonly used disk storage devices are cheap, but not free and certainly not perfectly reliable nor absolutely durable. Accordingly, there is a need to migrate data to cheaper and/or more reliable media, a need to backup data, and a need to make replicas.
The vast amounts of data and numbers of files maintained make manual management of data backup, replication, retention, and deletion burdensome, error prone, and impractical. Also, government regulations and business requirements demand that data management be conducted according to policy rules that conform to laws, practices, and so forth.
Even in a typical consumer home, there will be tens of thousands of files. For example, consider the operating system(s) and application program files, as well as financial documents and digital media photos (e.g., jpeg), music (e.g., mp3), and movies (e.g., mpeg). In an enterprise with thousands of employees, customer databases, and so forth, there can be hundreds of millions of files to be managed.
Taken together, the multitude of legal and business requirements and the vast number of file objects to be managed necessitate the automated application of data management policy rules.
Currently, almost every implementation of a data management system for files operates by reading the complete catalog of all directory entries for all of the files each time a management job is initiated.
The overhead of searching and reading the file catalogs and directories (scanning the metadata of a file system) whilst performing policy or rule driven maintenance operations such as backup and data migration is chewing up a significant number of cycles, so much so that it is becoming a significant problem or expense in the operation of these systems, as exemplified by Tivoli Storage Manager(TSM) (data backup) and Tivoli Storage Manager for Space Management(HSM) (data migration, which is also known as hierarchical storage management).
Regarding the prior art, recent versions of data backup products for WINDOWS NTFS partially address the above-described problem by implementing a change journal based backup feature. However, this approach has some limitations. For example, one limitation is that the change journal based backup feature is not crash proof. Journal integrity is lost upon reboot. A reboot event necessitates a complete new scan of all file system meta-data and a re-synchronizing of file lists and stats with the backup server. Moreover, another limitation is that the change journal based backup feature can degrade file system performance. Further, another limitation is that the change journal based backup feature is only supported on certain versions of the WINDOWS operating system. Also, another limitation is that the change journal based backup feature does not address the meta-data scanning problem for HSM. Additionally, another limitation is that the space required by the change journal based backup feature is (potentially) unbounded (or until it breaks). That is, every change is recorded in the journal and so the journal keeps growing at a rate that is proportional to the rate of file system change. Thus, in practice, the journal is periodically processed and trimmed by the storage management subsystems). However, the rate and amount of change can outpace the storage capacity of the journal and/or the processing cycles allocated to the storage management subsystem(s). When this “breakage” occurs, change information is lost. The management system then has to resort to a traditional full metadata scan.