1. Field
Embodiments of the invention relate to controlling incremental backups using object attributes.
2. Description of the Related Art
Incremental backup may be described as a process of backing up new and changed objects (e.g., files, directories, etc.), rather than backing up all objects. That is, the objects that remain unchanged since a last incremental backup are not backed up. A set of object attributes are typically associated with an object, and the set of object attributes are used to determine whether an object is a candidate for backup.
Tivoli® Storage Manager (TSM) is available from International Business Machines Corporation. TSM enables an organization to protect data from failures and other errors by storing backup data in a hierarchy of offline storage.
In some systems, the object attributes may be interpreted by backup systems. That is, the backup system understands which object attributes are relevant to determining whether incremental backup of the object should be performed. Relevant object attributes are ones that indicate that object content has changed and that incremental backup should be performed. With some backup systems, such as TSM, when the backup system determines whether to perform an incremental backup of an object, the backup system compares current object attributes of the object to previously stored object attributes at a server computer (i.e. the object attributes stored at the time of the last incremental backup). If the backup system is able to interpret the object attributes, the backup system is able to determine that a difference in certain object attributes, such as object size or modification time, indicates that the object should be backed up (i.e., that the object is to be resent to the server computer for backup), and is able to determine that differences in other object attributes, such as object ownership and permissions, indicates that metadata is to be updated on the server computer, without performing an incremental backup of the object.
Some object systems, such as a General Parallel File System (GPFS) available from International Business Machines Corporation and a Storage Area Network (SAN) Object, return object attributes as an opaque buffer to the backup system. An opaque buffer may be described as one for which the backup system does not know the structure, and, thus, is not able to interpret (i.e., the backup system is not able to identify separate object attributes in the opaque buffer). Some changed object attributes within the opaque buffer trigger incremental backup of the object, while other changed object attributes may be ignored because they do not reflect a change in object content. Since the backup system is not aware of the buffer structure, the backups system cannot distinguish object attributes that trigger incremental backup and those that can be ignored. Therefore, the backup system uses the entire opaque buffer or a checksum of the opaque buffer to determine whether to perform incremental backup of the object. This leads to unnecessary incremental backups when only object attributes that may be ignored are changed.
Instead of using an opaque buffer, object attributes may be stored in an opaque Binary Large Object (BLOB), which may also be referred to as a buffer. Using an opaque BLOB to communicate object attributes to the backup software is a technique that is too course grain. With existing backup systems, if there are object attributes included in the opaque BLOB that should not result in object content being backed up, the object system does not notify the backup system of this. Furthermore, if there are object system rich extended metadata associated with the opaque BLOB of an object, such as there is with a Storage Area Network File System (SANFS), then immaterial changes to object system state, such as which tier storage is used to store extents of an object, result in the object content being backed up unnecessarily.
A checksum may be described as a form of redundancy check. One type of checksum is a cyclic redundancy check (CRC). A CRC may be described as a type of hash function applied to data to generate a checksum, which is typically a small number of bits. The CRC is computed and appended to the data before transmission or storage. The CRC appended to the data is verified by a recipient of the data to confirm that no changes occurred on transit or in storage. CRC also enables correction of the data if information lost is lower than information held by the checksum. Thus, the CRC is used to detect and correct errors after transmission or storage.
Some backup systems use a checksum on the opaque BLOB to determine whether to perform an incremental backup. A backup system, such as TSM, receives an opaque BLOB for an object, performs a checksum on the entire opaque BLOB, and stores the checksum and the size of the opaque BLOB in a server computer repository. When another opaque BLOB is received for the object on the next backup, the backup system generates another checksum and compares this checksum to the stored checksum. In particular, if a checksum on the opaque BLOB matches the previously stored checksum on a previously received opaque BLOB, then the incremental backup is not performed as the object associated with the opaque BLOB is determined not to have changed.
Some backup systems use a checksum to detect changes to a collection of object attributes stored as an Access Control List (ACL), which includes information on access rights for an object. Again, this solution is too course grain. For example, if an ACL for an object includes the fact that the object is not READ accessible by some users, this information is not relevant to the object contents being incrementally backed up. Without detailed knowledge of the ACL, the backup system is not able to decide whether the object contents have changed or not, although some object attributes in the ACL (e.g., whether the object is READ accessible by some users) may have changed. Therefore, the backup system reacts in a conservative manner and performs an incremental backup of the object.
Some systems allow a file system to indicate to the backup system whether changes require a new incremental backup using an archive bit. However, simple changes to the metadata of an object (e.g., changing access rights in the ACL of an object) results in the archive bit being set for the object. Thus, the archive bit is too course grain in that the archive bit represents both content and metadata changes to a object.
Thus, there is a need in the art for improved incremental backups for systems in which the backup system receives an opaque buffer or BLOB.