Many organizations are upgrading their data backup systems to content-addressed storage (“CAS”) systems like Centera®. (Centera is a trademarked product owned and marketed by the assignee of the present invention.) In a CAS system, content addressed data objects are not organized in a multi-tiered directory file structure, but rather all data objects are stored in a flat, single-tiered directory without folders or sub-folders. As such, each data object on a CAS storage system has a single name, without any additional path, directory or folder information. In contrast, traditional location-based backup system files are stored within tiered folders, and file names often have references to a path or folder. Because a flat directory structure provides faster data object backup and retrieval, CAS storage systems offer a more efficient means of archiving, backing up, storing and retrieving data.
In a CAS storage system, data object names are provided by the system based upon the content and/or context of the file, using associated metadata such as creation date, creator name, project name, etc. Any alteration of a data object will change its content and/or associated metadata, and thus its name. The altered data object is then backed up as a new data object rather than overwriting a pre-existing data object. Thus, this taxonomy not only makes the archiving and organization of backup files more efficient, it also backs up of all versions each file. However, the use of a flat directory with system-provided data object names makes it difficult for the user to navigate the directory to locate and restore individual data objects.
This difficulty usually appears when an organization wishes to perform a secondary backup of its primary CAS storage system to a secondary backup system, and when the user subsequently needs to restore data objects from that secondary backup system.
Secondary backups for archiving primary CAS storage systems are managed by backup utility software applications known as data mover agents (DMA). One backup command executed by a DMA to initiate secondary backup of a CAS storage system to tape is the “dump” command, which is a computer instruction that causes the CAS storage system to dump its contents to a tape server or equivalent storage device. However, as noted previously, CAS storage systems retain data objects in a single flat directory, organized by path-less data object names, and not by the original user-defined file names. A dump-based directory (typically called a “file history”) appears as a list of data objects with names of apparently long strings of random characters that are nonintuitive to the user. A user looking at this directory of archived data objects will have a difficult time finding desired individual files for recovery because they will not have the original file names that the user is familiar with.
Additionally, certain DMA applications require certain fields in the metadata of the data objects. The primary CAS storage system may not generate these required fields, making the backup of some primary CAS storage systems incompatible with certain DMA applications. For example, some DMA applications manage data objects by using inodes, which is an integer number added to the file as additional metadata that is uniquely associated with each data object. Such a DMA cannot archive data objects from a CAS storage system that does not have inode data in their metadata files.