1. Field of the Invention
The invention relates to data storage in a computer network and, more particularly, to a system and method for optimizing storage operations.
2. Description of Related Art
The GALAXY data storage management system software manufactured by COMMVAULT SYSTEMS, INC. of Oceanport, N.J., uses storage policies to direct how data is to be stored. Referring to FIG. 1, there is shown a library storage system 100 in accordance with the prior art. Storage policies 20 in a management server 21 may be used to map copy data from a source 24, through a media agent 26 to a physical media location 28, 30, 32, 34, 36, 38 using e.g., tapes, drives, etc., where data is to be stored. Storage policies 20 are generally created at the time of installation of each media library, and/or stand alone drive. Numerous storage policies may be created and modified to meet storage management needs. A storage policy allows the user to define how, where, and the duration for which data should be stored without requiring intimate knowledge or understanding of the underlying storage architecture and technology. The management details of the storage operations are transparent to the user.
Storage policies 20 can be viewed as a logical concept that direct the creation of one or more copies of stored data with each copy being a self-contained unit of information. Each copy may contain data from multiple applications and from multiple clients or data sources. Within each copy are one or more archives, relating to a particular application. For example, one archive might contain log files related to a data store and another archive in the same copy might contain the data store itself.
Storage systems often have various levels of storage. A primary copy or data set, for example, indicates the default destination of storage operations for a particular set of data that the storage policy relates to and is tied to a particular set of drives. These drives are addressed independently of the library or media agent to which they are attached. In FIG. 1, the primary drives are media 28, 30, 32, 34, 36 and 38. Clearly other forms of storage media could be used such as tapes or optical media. The primary data set might, for example, contain data that is frequently accessed for a period of one to two weeks after it is stored. A storage administrator might find storing such data on a set of drives with fast access times preferable. On the other hand, such fast drives are expensive and once the data is no longer accessed as frequently, the storage administrator might find it desirable to move and copy this data to an auxiliary or secondary copy data set on a less expensive tape library or other device with slower access times. Once the data from the primary data set is moved to the auxiliary data set, the data can be pruned from the primary data set freeing up drive space for new data. It is thus often desirable to perform an auxiliary storage operation after a primary data set has been created. In FIG. 1, the auxiliary data set is copied to drives or tapes 40, 42 and 44.
Storage policies generally include a copy name, a data stream, and a media group. A primary copy name may be established by default whenever a storage policy for a particular client is created and contains the data directed to the storage policy. A data stream is a channel between the source of the data, such as data streams 50 and 52 in FIG. 1 and the storage media such as data streams 50 and 52 in FIG. 1. Such a data stream is discussed in HIGH-SPEED DATA TRANSFER MECHANISM, Ser. No. 09/038,440 referenced above. To increase the speed of a copy, data to be backed-up is frequently divided into a plurality of smaller pieces of data and these pieces are sent to a plurality of storage media using their own respective data streams. In FIG. 1, data from source 24 is broken into two portions and sent using streams 50, 52 to media 28, 36.
A client's data is thereby broken down into a plurality of sub-clients. In FIG. 1, media 28, 30, 32 and 34 may comprise a single media group and media 36 and 38 a second media group. A media group generally refers to a collection of one or more physical pieces of storage media. Only a single piece of media within the group is typically active at one time and data streams are sent to that media until it achieves full capacity. For example, data stream 50 will feed source data to medium 28 until it is full and then feed data to media 30. Multiple copies may be performed using multiple streams each directed to a respective media group using multiple storage policies.
Auxiliary copying, discussed in more detail in commonly owned application Ser. No. 10/303,640, denotes the creation of secondary copies, such as medium 40 or medium 42, of the primary copy. Since auxiliary copying involves multiple storage policies and data streams which each point to a particular media group, data is likely scattered over several pieces of media. Even data related to single stream copy operations might also be scattered over several media. Auxiliary copying is generally performed on a stream-by-stream basis and one stream at a time, in an attempt to minimize the number of times the primary media are mounted/unmounted. For example, for a copy of 10 pieces of primary media where four streams are used, auxiliary copying first entails copying all archive files of the first stream to a first set of auxiliary media, then the second stream to a second set of auxiliary media, etc. In FIG. 1, an auxiliary copy of stream 50 is made using auxiliary stream 50a to medium 40 and, if needed, medium 42. Thereafter, an auxiliary copy of stream 52 is made using auxiliary stream 52a to medium 44.
An archive file, at least with respect to auxiliary copying, is generally copied from a first chunk of data to a last chunk. When an auxiliary copy operation is cancelled or suspended before all chunks of an archive file are successfully copied to the destination copy, the chunks successfully copied are generally discarded or overwritten later when the archive file is again copied to the same copy or medium. This is undesirable because it wastes time and resources to copy the same chunks repeatedly; it wastes media because useless data occupies the media until the media is reusable; and if the network is not stable, a large archive file may never be successfully copied.
Although the GALAXY data storage management system software provides numerous advantages over other data storage management systems, the process for restoring copied data may require access to several media, which involves multiple mounting/unmounting of media, thereby increasing the time necessary for a restoration. Additionally, although an effort is made to minimize the number of times media are mounted and unmounted, the stream-by-stream basis used in auxiliary copying does not minimize the number of mount/unmount times necessary for the auxiliary copy and does not minimize tape usage. For example, in FIG. 1, media 40 and 44 may both be less then half full but both are needed to copy data through streams 50a, 52a using conventional techniques and both must be remounted for a restore. Performing auxiliary copying on a stream-by-stream basis is also generally a lengthy process. Finally, restarting a copy of an archive file that has been cancelled or suspended by always copying the first to the last chunk is inefficient with respect to media usage and the time necessary to complete a copy.
There is therefore a need in the art for a system and method for increasing the efficiency of storage management systems.