1. The Field of the Invention
The invention relates to data backup and recovery. Specifically, the invention relates to apparatus, methods, and systems for managing and formatting data in an autonomous data transfer operation.
2. The Relevant Art
Computer data is frequently stored in secondary, long-term storage devices such as disk drives and tape drives. Such data is often critical to the proper operation of various computer applications. While the data itself may have a very high priority, proper maintenance and preservation of the data through data backup and restore operations typically has a very low priority. Backup and recovery of data is often a lengthy process that requires significant bandwidth and server processing resources.
FIG. 1 illustrates a representative data management system 100 as is frequently used in the prior art for backing up and restoring data. A server 102 is connected to a network 104 and a separate storage area network (SAN) 106. A primary storage device such as a disk drive 108 and a secondary storage device such as a tape drive 110 communicate with the server 102 through the SAN 106.
The SAN 106 is a dedicated network comprising routers (not shown), switches (not shown), and the like that enable high-speed data transfers between devices connected to the SAN 106. Devices connected to the SAN 106, such as the server 102, disk drive 108, and tape drive 110, communicate using high-speed protocols such as Fiber Channel and/or Small Computer System Interface (SCSI). The server 102 may conduct high bandwidth data transfers between the disk drive 108 and the tape drive 110 over the SAN 106, instead of the network 104.
The server 102 may serve as a file server, a print server, a web server, a database server, or the like. It is desirable to minimize the resources such as memory and processor cycles required from the server 102 for conducting a data backup or restore operation. Accordingly, certain conventional data management systems 100 now allow for “server-free” data transfers. The communication protocol for the SAN 106 in such an arrangement includes a data transfer command that allows the server 102 to initiate a data transfer and then return to servicing other processing requests while a third party, another device besides the server 102, executes the data transfer. The third party functions autonomously to conduct the data transfer. Consequently, such operations are also known as autonomous data transfers.
Generally, the data mover 112 is a third party device connected to the SAN 106 that includes a processor and minimal memory. Data movers 112 may comprise routers, bridges, or the like. The data mover 112 is configured to execute a series of data transfer instructions.
The data transfer instructions are low-level instructions that designate a data source, a data destination, and a data size for a data transfer. The instructions are binary commands formatted according to the communication protocol of the SAN 106. Each instruction transfers one or more blocks of data from the source device to the destination device. The data size designates the number of blocks transferred. A data block is the smallest addressable data element that may be transferred between the source device and destination device. Generally, hardware manufacturers and/or the interface protocols for the source and destination devices determine the data block size. Data is transferred in blocks due to potential read/write restrictions of the source device or destination device.
One example of an autonomous data transfer operation is the SCSI third party extended copy command. This command includes a set of data transfer instructions defined within a command descriptor block (CDB). In operation, the server 102 generates and sends an initial CDB over the SAN 106 to the data mover 112. Based on the CDB, the data mover 112 executes an autonomous data transfer between the disk drive 108 and the tape drive 110 to back up data. The data mover 112 is also used to transfer data between the tape drive 110 and the disk drive 108 to restore data.
Such back ups and restorations are conducted on data of various formats. The formats include logical formats such as volumes, files, folders, and the like. Other formats include physical formats such as blocks, sectors, tracks, and the like.
Generally, data management systems are required to back up and restore the data in a manner that leaves the data readily accessible, error free and unaltered once restored. Consequently, metadata is included within the data for identifying and checking data integrity. Conventionally, the metadata comprises data describing characteristics of other data, such as the type, size, error checking information, and any identifying information. Generally, a data management system should remove any metadata added to the user's data so that the restored data is unaltered when presented to a user.
Because the users often desire privacy and security for the data, it is desirable that any backup operations that insert metadata into a user's data stream also remove the metadata and restore the data to its original form. In this manner, the user can be confident that the data is secure and the privacy is preserved.
Unfortunately, due to the complexities involved, limited amounts of metadata are included with the user's data in conventional autonomous data transfers. Typically, the metadata of conventional systems includes only a header and/or end of data marker. No metadata is inserted into the data stream. Consequently, undetectable errors may occur within the data stream.
Certain types of server-free data transfers, such as the SCSI third party extended copy command, place metadata that is pre-generated by the server 102 in a header. Generally, to facilitate backup, recovery, and other data transfers, the physical data blocks of one storage device are sized such that one or more data blocks evenly fit within the blocks of another storage device. For example, generally, the block size on the disk drive 108 and the block size on the tape drive 110 are such that data transferred between the disk drive 108 and the tape drive 110 ends on a block boundary. Including a header with one or more data blocks offsets the data block sizes such that one or more data blocks are split by a block boundary. One may avoid this problem by making the header correspond to the size of one block of the destination device. Doing so, however, wastes space on the destination device. Inserting metadata within a transferred data stream further complicates the problem.
Further unresolved problems lie with the instructions used to formulate the SCSI third party extended copy command. The instructions for the command are rote data transfer instructions to move data from location A to location B. Because the data transfer instructions are such low-level instructions, the instructions must be precisely ordered to properly complete the data transfer. If the instructions are inaccurate, significant time and resources may be wasted.
Neither the data mover 112 nor the autonomous data transfer protocol support logical operations such as generating and inserting metadata within a transferred data stream. Consequently, while it may be desirable to insert metadata within the data stream to allow for more accurate data integrity checks, conventional technology would require the data mover 112 to contact the server 102 for a metadata element defeating the autonomy of the data transfer.
Furthermore, the data mover 112 has a very limited throughput and a low amount of memory for executing an autonomous data transfer. As a result, transferring data objects larger than the capacity of the data mover 112 causes the data mover 112 to fail to complete the operation. In addition, the data capacity may vary between different data movers 112. Conventionally, data movers 112 do not possess the logic required to divide an autonomous data transfer of large data objects into more managable segments.
Accordingly, what is needed is a system and method to overcome the problems and disadvantages of the prior art. In particular, the system and method should be able to insert metadata within and extract metadata from a data stream that is transferred using an autonomous data transfer to allow for more accurate data integrity checking. In addition, the system and method should backup and restore data, including embedded metadata, without altering the original data. Furthermore, the system and method should minimize wasted storage space on the destination storage device. The system and method should divide an autonomous data transfer into segments that are manageable by an available data mover. And, the system and method should allow an autonomous data transfer that includes splitting data blocks and/or segments across block boundaries of the destination storage device.