In a network backup environment, a client system may backup data to a remote storage device over a network and coordinate the backup with a storage management server. For instance, the International Business Machines (IBM®) Tivoli® Storage Manager product provides software for client and server systems to backup client data (IBM and Tivoli are registered trademarks of IBM). The client transfers files from its file system to the storage management server. The storage management server maintains a backup database having information on files sent to the storage management server.
When a file (i.e., data stream) is sent from the client to the server, there are file attributes (e.g., file size, file modification time, etc.) and ancillary data streams associated with the file (e.g., access control lists, extended attribute streams, generic alternate data streams, etc.) that are sent to the storage management server. The ancillary data streams associated with a file are usually unbounded in size and therefore cannot be stored as attributes in a database. Instead, the ancillary data streams are typically stored in the disk/tape storage. Therefore, these data streams are transmitted within the file's data stream. The placement of the ancillary data streams in the file is arbitrary. That is, the ancillary data streams may be positioned in front of the file data or after the file data during data transmission.
In a progressive incremental backup system, a file object from a client is stored on a server during an initial backup. During a subsequent backup, the file object is not transmitted to the server unless the data, associated streams, or attributes have changed since the previous backup.
In current systems, if file attributes have changed, the file attributes are updated in the server database by overwriting the previous version. Thus, the storage management server only has a copy of the file with the current attributes and there is no way to recover previous instances of the file attributes. Also, if either the file data or an associated stream has changed since the most recent backup, the entire file (attributes, file data and associated streams) is sent to the storage management server. Thus, even if only one associated stream has changed, the file data and associated streams are transmitted in their entirety although only a small number of bytes of data have changed.
The above approach unnecessarily creates another complete instance of the file, even though only a small portion of the file has changed and needs to be updated. This increases storage requirements and may cause an older version of the file to roll off so that earlier recovery points are eliminated.
Additionally, in current systems, in order to recover a small amount of stream information (for example, to recover a corrupted ACL), a restore of the entire file (file data and streams) is required.
Methods and systems are needed that can overcome the aforementioned shortcomings.