In the field of data storage and recovery, storage area networks (SANs) are being used more and more frequently to store data because of their high speed, reliability, and relative fault tolerance characteristics.
A SAN is a high-speed network that separates storage traffic from other types of network traffic. Prominent standards used in data storage in conventional art are Small Computer Systems Interface (SCSI), Fibre Channel (FC), SAS and ATA/SATA. The redundant array of independent disks (RAID) standard is used for creating fault tolerant data storage.
State-of-the-art storage systems comprise a SAN network of storage devices and host nodes, in many instances connected together by a Fibre Channel (FC) switch. There are other variations of SAN architecture that use a different transport protocol such as high speed Ethernet in place of the FC protocol for the storage network.
In addition to the above, late-developing enhancements such as iSCSI and Fibre Channel over Internet protocol (FCIP) have enabled data storage networks to be managed as two or more networked SAN storage islands connected into a larger network by an IP tunnel. In these cases, which are not yet widely practiced, TCP/IP is leveraged as well as encapsulation (frame packaging) methods to enable data storage devices to communicate with each other efficiently over the Internet in a dedicated manner.
In typical application data generated by nodes on a host network is written to a primary storage on a SAN. Data from the primary storage system is then typically archived or backed up to a tape media in a batch-mode fashion at the end of a work period. Typically, a larger number of data-generating machines in a host network, like PCs and servers, back up the data to a smaller number of mass storage devices like a tape library. For many applications leveraging an off-site storage solution, data written to a primary storage system is transferred to one or more tape drive systems as described above for archiving to magnetic tape media, which can be securely stored off-location on behalf of an enterprise.
A typical problem with the backup operation (writing the data to tape) is that data generated from some machines during any work period can be of a very large volume and can take considerably longer to back-up than data from other machines. The backup window for a specific host can range anywhere from 30 minutes to 48 hours and above depending on the volume of data changes generated.
Another problem is that the backup data is sent from the host nodes to the tape drive over the LAN. Rendering the data from RAID to tape is typically done in a manually orchestrated batch mode operation performed by an administrator with the help of backup software. Under these conditions the operating host data network (LAN) must share bandwidth with the components involved in securing the backup data to tape media.
Yet another limitation with prior art systems is that if a data recovery operation is required, wherein the data desired is already archived to tape media, the recovery process is comparatively much slower than, for example, recovery of near-term data from a hard-drive disk.
There are still more limitations are apparent with practices of prior art storage and backup systems. For example, with prior art data backup software data movement is from each of the hosts. Moreover, most prior art backup systems perform backup operations at the file level in a non continuous fashion, which can cause additional disk seeks to find out which files have actually changed by the scheduled backup time.
What is needed in the art is a method and apparatus for archiving data to a backup data-storage sub-system that solves the above problems.