The present invention relates to a computer primary data storage system that integrates the functionality of file backup and remote replication to provide an integrated storage system that protects its data from loss related to system or network failures or the physical loss of a data center.
Traditional primary disk storage systems are comprised of disk drives attached to an I/O channel or a redundant arrays of independent disks (RAID) controller apparatus. Although many of these systems use microprocessors to coordinate the handling of requests from clients/servers and for performing RAID data protection, they were not designed to subsume the functionality of traditional data protection systems and software like magnetic tape-based backup. RAID data protection systems only manage data at the sector/block level and have no knowledge of which collection and which order of these blocks comprises a file, so they cannot perform file-level integrity checking of data.
By far, the most common form of magnetic disk data protection is the periodic backup of its data onto magnetic tape. There are many issues associated with magnetic tape based data protection schemes:                Magnetic tape has not been able to maintain the same cost per gigabyte reductions that magnetic disk drives have over the past 17 years. In 1985, magnetic tape storage was about thirty-five times less expensive than magnetic disk, making it a cost-effective choice as a backup storage medium. In 2002, magnetic disk drives were only about twice as expensive as magnetic tape. This trend is expected to continue into the future when it may be more costly to store data on magnetic tapes as compared with magnetic disks.        Magnetic tape has historically demonstrated low reliability and this translates into low customer confidence in being able to restore data from tape. There are many reported instances of poor tape reliability which translates to unsuccessful data restores. As an example, the Jan. 29, 2003 issue of ComputerWorld provided the following quote of a network administrator at a major New York-based financial services company: “85% of my company's backups contained major errors that made the recovered data either totally unusable or incomplete.”        One of the reasons for poor recoverability of data stored on tape is related to the wear-out mechanisms between tape drive heads and the media itself. When tape media contacts a tape drive head, both the media and the drive heads experience friction and wear. Even with a single tape, the quality of the recording varies, based on the state of the tape drive head and the magnetic tape at the time of the recording.        Because tape media quality diminishes over time, storage administrators must refresh their tapes periodically. This involves copying data from the older tape onto a new tape. This is a very time consuming process, with each tape taking multiple hours to complete a copy. For this reason, the process is rarely performed.        There are many incompatible magnetic tape and tape drive technologies. Even within a product line from a single vendor, there are older versions of tape that are not readable by that vendor's latest tape drives. Once a technology is selected by a customer and is used for many years, it is difficult for that customer to change to a different tape technology. Typically, large repositories of tapes either have to be migrated to the new tape technology or the customer's administrator typically must maintain multiple incompatible tape drive systems.        The archive environmental requirements for magnetic tape are more restrictive than magnetic disk. When tape is subjected to environmental changes that occur during media transport to an offsite storage facility, the reliability and readability of the data on the media is diminished from that time forward. The following table shows the relative archive environmental limits for both magnetic tape and magnetic disk technologies.        
Magnetic TapeMagnetic DiskArchive Temp (C.)18–28 degrees C.−40 to 65 degrees C.Archive Humidity (%)40–60%5–95%                It takes significant administrative effort to manage removable media with today's magnetic tape and optical disk solutions. An administrator typically must manually move these media from online jukeboxes to offline shelves and possibly to offsite storage locations. The greatly varying environmental conditions that tapes are subjected to during shipment to offsite locations as well as the associated shock and vibration associated with handling and shipping tape media work together to reduce the reliability and availability of tape-based data.        When magnetic tape must be used to recover data after the failure of a computer system or loss of an entire site, the recovery process can take days or even weeks to complete. Storage administrators must review backup catalogs, recover sets of tapes from local or offsite storage facilities, and rebuild a tape-based recovery infrastructure with servers, tape library units, and backup software. Next, they must reload all required tapes, and if necessary, respond to any tape media and tape drives failures that inhibit all data from being recovered successfully.        The use of backup software contributes to significant tape media costs due to over-replication of data. Each week, most companies perform full backups and maintain as much as a years worth of these full backup tape sets. Typically each full backup tape set will contain greater than 80% of the same content as the last full backup. So after a year, the customer has over 50 tape sets of mostly replicated data.        It is difficult to eliminate or recycle tapes from a large tape archive. Critical content that must be preserved resides on the same tape medium with content that does not need to be maintained any longer. For this reason, tape archives expand beyond reasonable administrative control.        It is impossible to ascertain the quality of data on specific magnetic tapes within an archive without placing each tape into a tape drive and reading that tape from beginning to end. It may take hours to complete the scan of one tape alone. This is a time-consuming process and even when data is found to be damaged, there is typically no way to replace the damaged data with known, good data.To the extent that disk-based data protection systems exist, a need remains for a comprehensive and cost-effective data back-up system that allows a user to effectively adjust their back-up strategies as their needs change and that ensures the integrity of the data that is backed-up.        