The invention disclosed herein relates generally to performing integrated storage operations, including management of snapshots, backups and other storage operations performed on an information store. More particularly, the present invention relates to a system and method for combining different data snapshot and backup strategies for homogenous data protection according universal policies entered by a user or administrator.
To obtain a more thorough understanding of the present invention, the following discussion provides additional understanding regarding the manner is which information is typically stored on magnetic media. Using traditional techniques, backups of an information store are performed using the operating system's file system. Backup is done by accessing the operating system's (OS) file system for the information store to be backed-up, such as the Windows NTFS file system. The file allocation system of the operating system typically uses a file allocation table to keep track of the physical or logical clusters across which each file in the information store is stored. Also called an allocation unit, a cluster is a given number of disk sectors that are treated as a unit, each disk sector storing a number of bytes of data. This unit, the cluster, is the smallest unit of storage the operating system can manage. For example, on a computer running Microsoft's Windows 95 operating system, the OS uses the Windows FAT32 32-bit file allocation table having a cluster size to 4K. The number of sectors is determined when the disk is formatted by a formatting program, generally, but not necessarily, when the OS is installed.
The operating system allocates disk space for a file only when needed. That is, the data space is not preallocated but allocated dynamically. The space is allocated one cluster at a time, where a cluster is a given number of consecutive disk sectors. The clusters for a file are chained together, and kept track of, by entries in a file allocation table (FAT).
The clusters are arranged on the disk to minimize the disk head movement. For example, all of the space on a track is allocated before moving on to the next track. This is accomplished by using the sequential sectors on the lowest-numbered cylinder of the lowest numbered platter, then all sectors in the cylinder on the next platter, and so on, until all sectors on all platters of the cylinder are used. This is performed sequentially across the entire disk, for example, the next sector to be used will be sector 1 on platter 0 of the next cylinder.
For a hard (fixed) disk, FAT, sector, cluster, etc. size is determined when a disk formatting program formats the disk, and are based on the size of the partition. To locate all of the data that is associated with a particular file stored on a hard disk, the starting cluster of the file is obtained from the directory entry, then the FAT is referenced to locate the next cluster associated with the file. Essentially, the FAT is a linked list of pointers to clusters on the disk, e.g., each 16-bit FAT entry for a file points to the next sequential cluster used for that file. The last entry for a file in the FAT has a number indicating that no more clusters follow. This number can be from FFF8 to FFFF (base 16) inclusive.
FIG. 1 shows an example directory entry 2 of a Windows-formatted hard disk and accompanying FAT 20. The exemplary directory entry 2 consists of 32 bytes of data. The name of the file and its extension are stored in the first eleven bytes 4 of the directory entry 2 and a file attribute byte 6 is provided. By definition, ten bytes 8 are reserved for future use and four bytes are provided to store time 10 and date 12 information (two bytes each). Two cluster bytes 14 point to the first cluster of sectors used to store the file information. The last four bytes 18 of the directory entry 2 are used to store the size of the file.
A sixteen-byte section of a FAT 20 is depicted. The first four bytes 21 store system information. A two-byte pair, bytes four and five (16), are the beginning bytes of the FAT 20 used to track file information. The first cluster for data space on all disks is cluster “02.” Therefore, bytes four and five (16) are associated with the first cluster of disk sectors “02” used to store file information. Bytes six and seven (22) are associated with cluster “03” . . . and bytes fourteen and fifteen (24) are associated with cluster “07.”
This example illustrates how sectors associated with a file referenced in a directory are located. The cluster information bytes 14 in the directory 2 point to cluster number “02.” The sectors in cluster “02” (not shown), contain the initial sector of data for the referenced file. Next, the FAT is referenced to see if additional clusters are used to store the file information. FAT bytes four and five (16) were pointed to by the cluster information bytes 14, and the information stored in bytes four and five (16) in the FAT 20 point to the next cluster used for the file. Here, the next cluster is “05”. Accordingly, cluster “05” contains the next sector of data for the referenced file. FAT bytes ten and eleven (26) contain an end-of-file flag, “FFFF,” indicating there are no more clusters associated with the referenced file. All of the information comprising the referenced file, therefore, is contained in clusters “02” and “05” on the disk.
As with other applications running on the computer, a typical copy application provides a read request to the operating system, which handles interpretation of the information contained in the FAT and reading of each file for the copy application. A file system is provided on the destination storage device that is used by the copy application to write files. Similarly, the recovery portion of the copy application, or a separate recovery application, may read files from the destination storage device for recovery of the information.
One currently available alternative is to perform snapshots of an information store. With current snapshot systems and methods, administrators create an incremental copy that is an exact point-in-time replica of the source volume each time a snapshot is taken. The snapshot is stored locally on the information store from which it was taken and tracks incremental changes to the data in the information store. Furthermore, changed data is written to a new location in the information store as tracked by the snapshot. With knowledge regarding the change, as well as the changed data, the snapshot can be used to “roll back” changes to an information store to the point in time when the snapshot was taken. If there should be any logical corruption in the information store's data that went un-detected for a period of time, however, these incremental updates faithfully replicate that logical corruption to the data when copying. Additionally, other drawbacks are associated with currently known snapshot techniques, including the significant drawback of preventing restoration from the snapshot in the event that the information store fails, as both the snapshot and the information store become unavailable.
Another technique known in the art is serverless or extended copy. Extended copy systems utilize intelligent devices to perform copying of an information store, e.g., a disk array attached to a database server. These systems conduct copying without requiring the use of a copy server or the server whose data is being copied to move a data stream from source volume to a storage device, such as a tape drive. The intelligent device is typically a data router (SCSI to Fibre Channel bridge), or other network infrastructure device, that is in communication with an information store or other storage devices, e.g., in a storage area network (“SAN”). Recently, a set of extended copy commands have been incorporated into storage devices themselves, which are connected to the SAN. In this case, the server tells the storage device which data in an information store to copy and the storage device then moves the data directly from the information store to the storage device. Since there is no server involved in transporting the data stream, this copy method is known as serverless.
Over time, an organization may employ different combinations of techniques known in the art to perform backups, snapshots and other copying of information stores. Unfortunately, none of these techniques, or the systems that implement them, are compatible, which heretofore disallows using universal policies to leverage multiple techniques known to the art in a unified fashion. The problem of not having a universal backup or snapshot policy becomes especially pronounced when applied to a multiple-host environment; for example, several hosts coupled to a storage area network (SAN). In these environments, for example, it would be advantageous to periodically perform a full copy of a information store, which organizes the copy by file names and folders, and includes the capability to restore individual data file names and folders. Yet, it is also advantageous to perform periodic snapshots of an information store, where each snapshot is an index of the data contained on an information store at the point in time when the snapshot was taken.
The incompatibility of current systems causes significant difficulty in host network administration. For example, if a host administrator wishes to restore a full copy or snapshot of an information store, the correct software must be loaded, and a copy or snapshot created at some point in time must be selected for restoring. In the case of restoring a full copy, selected files may be restored. In the case of a snapshot, the administrator is typically limited to restoring a whole snapshot. If, for example, recovery of backup files or a snapshot is desired because of a discovered problem with a volume, the selected copy or snapshot should be one that was created before the problem occurred. Even after finding a seemingly correct copy or snapshot to restore, however, it is quite possible that a different snapshot or copy system created a copy or snapshot that was performed even later in time, but before the problem occurred, which would be more desirable to restore. Nevertheless, the administrator has no way to determine if this is the case.
Related to this problem, currently incompatible backup systems do not leverage off each others' capabilities. For example a snapshot cannot be leveraged to perform typical operations performed using a full copy, including file or folder level restoration. Even limiting to just using snapshots, those created by different software systems may not even be compatible, and therefore, one type of snapshot system cannot leverage the capabilities of the other.
Thus, there is a need for systems and methods employing policies to determine the time and order in which one or more storage operations are performed on one or more data stores.