Enterprises utilize backup storage systems to protect data on their computer systems from loss by copying the data of the computer system and storing it at the backup storage system. The process relies on a client application being executed at each computer system to be backed up to the backup storage system. The client marshals the data to be backed up and transmits it to a backup server that stores it in a set of storage devices local to or in communication with the backup server.
Due to the large amounts of data to be backed up, the backup server and client may deduplicate the backed up data, compress the backed up data and similarly reduce the amount of storage required for the backed up data. Similarly, the backup client can be selective in the data sent to the backup server by sending only data that has changed since the last backup operation or by compressing the data. The communication between the backup client and the backup server or the backup storage devices can be compliant with the network data management protocol (NDMP).
The backup client can support the recovery of the backed up data allowing a user to select files from the backed up data and then sending the selected data to the backup client. To facilitate this functionality the backup server maintains an index of the files that can be separately retrieved from the backup storage system. The index is created as the files are received from the backup client. The indexing of the data can also be facilitated by the backup client where the backup client sends index information separate from the file data to be backed up. At the backup server the index data is stored in an index database (“indexdb”). In the case where NDMP is utilized, the indexes generated at the backup client and sent to backup server by NDMP are usually out of order (not in depth first order) and then it becomes necessary to convert these indexes into ordered indexes so that they can be committed to the indexdb.
There is an overhead of converting out of order indexes to ordered indexes. There is a temporary space overhead, a central processing unit (CPU) overhead, and memory overhead. In typical NDMP based backups this overhead or additional processing time is in measured in hours post backup. For example, in cases where there are 10 million files to be indexed, the reordering process can take 40 minutes, 20 million files takes 1 Hour 20 minutes, and 30 million files takes 2 Hours 5 minutes. The time increases linearly with the increase in number of files. Huge file systems with 100s of millions of files are common now and index reordering processing time can have a very big impact on total time for the backup. In addition to the time for this processing there is significant storage space required to track the files to be indexed. For example, the amount of space required can be: 2*(144+average file name length)*number of entries in the file system.