The term “archival data” as used herein refers generally to file backups or other types of non-primary information storage in a designated long-term storage system. Conventional archival data storage typically involves the regular backup of data from a computer or other client machine to an optical jukebox, redundant array of inexpensive disks (RAID) device, magnetic tape drive or other device in a long-term storage system.
A typical scenario involves providing backup as a central service for a number of client machines. Client software interfaces with a file system or database and determines what data to back up. The data is copied from the client to a storage device, often over a network, and a record of what was copied is stored in a catalog database.
A more particular example of archival data storage of the type noted above is the file system in the computer environment known as Plan 9, as described in R. Pike et al., “Plan 9 from Bell Labs,” Computing Systems, Vol. 8, No. 3, pp. 221-254, Summer 1995, which is incorporated by reference herein. The Plan 9 file system stores archival data to an optical jukebox. The archival data is stored in the form of a “snapshot,” that is, a consistent read-only view of the file system at some point in the past. The snapshot retains the file system permissions and can be accessed using standard tools, and thus without special privileges or assistance from an administrator. Snapshots avoid the tradeoff between full and incremental backups. Each snapshot is a complete file system tree, much like a full backup. The implementation, however, resembles an incremental backup because the snapshots and the active file system share any blocks that remain unmodified. A snapshot only requires additional storage for the blocks that have changed. To achieve reasonable performance, the device that stores the snapshots must efficiently support random access, limiting the suitability of tape storage for this approach.
Other known archival data storage systems include the Elephant file system, described in D. S. Santry et al., “Deciding when to forget in the Elephant file system,” Proceedings of the 17th Symposium on Operating Systems Principles, Dec. 12-15, 1999, and the Stanford Archival Vault, described in A. Crespo and H. Garcia-Molina, “Archival storage for digital libraries,” Proceedings of the 3rd ACM International Conference on Digital Libraries, 1998, both papers being hereby incorporated by reference herein.
Recent substantial increases in the capacity of various storage technologies are making it practical to archive data in perpetuity. However, conventional techniques such as those described above are generally not optimized for providing this type of storage. A need therefore exists for improved archival data storage techniques which better exploit the expected ongoing growth in available storage capacity.