The invention disclosed herein relates generally to data storage systems in computer networks and, more particularly, to improvements in storing, verifying accurate archiving of electronic data and continuing reverification archives of electronic data.
Storage architectures used by individual computers or data stores to store electronic data typically include volatile storage media such as Random Access Memory “RAM”, and one or more nonvolatile storage devices such as hard drives, tape drives, optical disks, and other storage devices that form a part of or are directly associated with an individual computer. Such storage devices may provide primary storage for a primary copy of data.
A network of computers such as a Local Area Network “LAN” or a Wide Area Network “WAN”, typically store electronic data via servers or storage devices accessible via the network. Storage devices are generally connected to one individual computer or to a network of computers. Network storage devices commonly known in the art typically include physical drives in which tapes or other storage media are stored and a robotic arm which is used to place the tapes or storage media into the drives. Examples of network storage devices include networkable tape libraries, optical libraries, Redundant Arrays of Inexpensive Disks “RAID”, and other devices. Another network storage device may be Network Attached Storage “NAS” which includes storage devices that may provide file services and one or more devices on which data is stored.
The first copy of production data generated by a client is sometimes referred to as the primary copy, and is used in the first instance to restore the production data in the event of a disaster or other loss or corruption of the production data. Under traditional tiered storage, the data on the primary storage device is migrated to other devices, sometimes referred to as secondary or auxiliary storage devices. This migration can occur after a certain amount of time from which the data is first stored on the primary device, or for certain types of data as selected in accordance with a user-defined policy. Usually, with tiered storage patterns, the storage devices used to store auxiliary or secondary copies of data have less availability, lower performance, and/or fewer resources than devices storing the production or primary copies. That is, primary storage devices tend be faster, higher capacity and more readily available devices, such as magnetic hard drives, than the ones used for storing auxiliary copies, such as magnetic or optical disks or other removable media storage devices.
Electronic data is typically copied to secondary storage according to a schedule, for example, data is designated to be copied and stored once a day. Generally, data is archived in the event that a primary or original copy becomes unavailable, for example, the data is destroyed, lost or otherwise inaccessible. In general, the data is directed to a system component to be copied to secondary storage media, and stored as an auxiliary copy, a backup copy, quick recovery copy, or other copy. Some systems check the secondary copy to ensure the secondary copy is accurate. Generally, the check includes steps such as analyzing each data item copied and comparing it to the original data, fingerprint, hash, or other segment of data, or other method. Such verification methods can be lengthy and time consuming for copies of large volumes of data, requiring significant use of system resources. Alternatively, some systems use cursory data checks when a secondary copy is made, such as only comparing filenames copied to secondary storage with file names from primary storage, which is less time consuming, but also yield a less reliable data check.
Data copies stored to media may have a shelf life which may be based on media life expectancy. To maintain a reliable set of copies, a subsequent data copy (such as a copy of a copy) may be made before the end of a media item's life expectancy. A media manufacturer may provide an indication of a media item's life expectancy, however, the life expectancy may not take into account user or enterprise use of the media or other media characteristics. Thus, in use, media may actually have a shorter or longer life than its life expectancy because of media use, other media characteristics, or other external factors. Since media may be costly, a user or enterprise may wish to maximize media life and use of media while avoiding media failure.