In the field of data storage and recovery, storage area networks (SANs) are being used more and more frequently to store data because of their high speed, reliability, and relative fault tolerance characteristics.
A SAN is a high-speed network that separates storage traffic from other types of network traffic. Prominent standards used in data storage in conventional art are Small Computer Systems Interface (SCSI), Fibre Channel (FC), SAS and ATA/SATA. The redundant array of independent disks (RAID) standard is used for creating fault tolerant data storage.
State-of-the-art storage systems comprise a SAN network of storage devices and host nodes, in many instances connected together by a Fibre Channel (FC) switch. There are other variations of SAN architecture that use a different transport protocol such as high speed Ethernet in place of the FC protocol for the storage network.
In addition to the above, late-developing enhancements such as iSCSI and Fibre Channel over Internet protocol (FCIP) have enabled data storage networks to be managed as two or more networked SAN storage islands connected into a larger network by an IP tunnel. In these cases, which are not yet widely practiced, TCP/IP is leveraged as well as encapsulation (frame packaging) methods to enable data storage devices to communicate with each other efficiently over the Internet in a dedicated manner.
In typical application data generated by nodes on a host network is written to a primary storage on a SAN. Data from the primary storage system is then typically archived or backed up to a tape media in a batch-mode fashion at the end of a work period. Typically, a larger number of data-generating machines in a host network, like PCs and servers, back up the data to a smaller number of mass storage devices like a tape library. For many applications leveraging an off-site storage solution, data written to a primary storage system is transferred to one or more tape drive systems as described above for archiving to magnetic tape media, which can be securely stored off-location on behalf of an enterprise.
A typical problem with the backup operation (writing the data to tape) is that data generated from some machines during any work period can be of a very large volume and can take considerably longer to back-up than data from other machines. The backup window for a specific host can range anywhere from 30 minutes to 48 hours and above depending on the volume of data changes generated.
Another problem is that the backup data is sent from the host nodes to the tape drive over the LAN. Rendering the data from RAID to tape is typically done in a manually orchestrated batch mode operation performed by an administrator with the help of backup software. Under these conditions the operating host data network (LAN) must share bandwidth with the components involved in securing the backup data to tape media.
Yet another limitation with prior art systems is that if a data recovery operation is required, wherein the data desired is already archived to tape media, the recovery process is comparatively much slower than, for example, recovery of near-term data from a hard-drive disk.
There are still more limitations apparent with practices of prior art storage and backup systems. For example, with prior art data backup software data movement is from each of the hosts. Moreover, most prior art backup systems perform backup operations at the file level in a non continuous fashion, which can cause additional disk seeks to find out which files have actually changed by the scheduled backup time.
The inventors are aware of a system for providing secondary data storage and recovery services for one or more networked host nodes. The system has a server application for facilitating data backup and recovery services; at least one client application for facilitating host node configuration to receive services; a secondary data storage medium; and at least one mechanism for passive acquisition of data from the one or more host nodes for storage into the secondary data storage medium by the server application. In a preferred embodiment secondary storage is streamlined through continuous data backup and enhanced by elimination of redundant write data.
In this system data is passively split off of the data storage paths for each host writing data to storage using an off-the-shelf data splitting hardware. The data is sent to a secondary storage server for the purpose of data writing to a secondary storage disk for near-term data storage and recovery purposes. Among other enhancements over prior-art storage methods, the write data is checked for redundancy over a single or over multiple hosts using a metadata table that defines previous writes made by the same hosts and compares new writes to the old ones. In this way only change data, or data that is new is retained and stored on the secondary storage medium.
It has occurred to the inventors that a direct access to primary storage data from a secondary storage server will in the majority of cases leverage the caching provided by the primary storage if the read happens soon after the original data write. Taking advantage of this behavior would reduce the overhead from data reads.