The invention disclosed herein relates generally to performing storage operations on electronic data in a computer network. More particularly, the present invention relates to selecting, in response to the initiation of a storage operation and according to selection logic, a media management component and a network storage device to perform storage operations on electronic data.
Storage of electronic data has evolved through many forms. During the early development of the computer, storage of this data was limited to individual computers. Electronic data was stored in the Random Access Memory (RAM) or some other storage medium such as a hard drive or tape drive that was an actual part of the individual computer.
Later, with the advent of networked computing, storage of electronic data gradually migrated from the individual computer to stand-alone storage devices accessible via a network. These individual network storage devices soon evolved in the form of networkable tape drives, optical libraries, Redundant Arrays of Inexpensive Disks (RAID), CD-ROM jukeboxes, and other devices. Common architectures included drive pools, which generally are logical collections of drives with associated media groups including the tapes or other storage media used by a given drive pool.
Serial, parallel, Small Computer System Interface (SCSI), or other cables, directly connected these stand-alone storage devices to individual computers that were part of a network of other computers such as a Local Area Network (LAN) or a Wide Area Network (WAN). Each individual computer on the network controlled the storage devices that were physically attached to that computer and could also access the storage devices of the other network computers to perform backups, transaction processing, file sharing, and other storage-related operations.
Network Attached Storage (NAS) is another storage scheme using stand-alone storage devices in a LAN or other such network. In NAS, a storage controller computer still “owns” the storage device to the exclusion of other computers on the network, but the SCSI or other cabling directly connecting that storage device to the individual controller or owner computer is eliminated. Instead, storage devices are directly attached to the network itself.
Yet another network storage scheme is modular storage architecture which is more fully described in application Ser. No. 09/610,738 and application Ser. No. 09/744,268. An example of such a software application is the Galaxy™ system, by CommVault Systems of Oceanport, N.J. The Galaxy™ system is a multi-tiered storage management solution which includes, among other components, a storage manager, one or more media agents, and one or more storage devices. The storage manager directs storage operations of client data to storage devices such magnetic and optical media libraries. Media agents are storage controller computers that serve as intermediary devices managing the flow of data from client information stores to individual storage devices. Each storage device is uniquely associated with a particular media agent and this association is tracked by the storage manager.
A common feature shared by all of the above-described network architectures is the static relationship between storage controller computers and storage devices. In these traditional network architectures, storage devices can each only be connected, virtually or physically, to a single storage controller computer. Only the storage controller computer to which a particular device is physically connected has read/write access to that device. A drive pool and its associated media group, for example, can only be controlled by the computer to which it is directly connected. Therefore, all backup from other storage controller computers needs to be sent via the network before it can be stored on the storage device connected to the first storage controller computer.
At times, storage solutions in some of the above-described network architectures including LAN, NAS, and modular storage systems may cause overloading of network traffic during certain operations associated with use of storage devices on the network. The network cable has a limited amount of bandwidth that must be shared among all the computers on the network. The capacity of most LAN or network cabling is measured in megabits per second (mbps) with 10 mbps and 100 mbps being standard. During common operations such as system backups, transaction processing, file copies, and other similar operations, network traffic often becomes overloaded as hundreds of megabytes (MB) and gigabytes (GB) of information are sent over the network to the associated storage devices. The capacity of the network computers to stream data over the network to the associated storage devices in this manner is greater than the bandwidth capacity of the cabling itself so ordinary network activity and communication slows to a crawl. As long as the storage devices are attached to the LAN or other network, this bandwidth issue remains a problem.
The Storage Area Network (SAN) is a highly-evolved network architecture designed to facilitate transport of electronic data and address this bandwidth issue. SAN architecture requires at least two networks. First, there is the traditional network described above which is typically a LAN or other such network designed to transport ordinary traffic between network computers. Then, there is the SAN itself which is a second network that is attached to the servers of the first network. The SAN is a separate network generally reserved for bandwidth-intensive operations such as backups, transaction processing, and the like also described above. The cabling used in the SAN is usually of much higher bandwidth capacity than that used in the first network such as the LAN and the communication protocols used over the SAN cabling are optimized for bandwidth-intensive traffic. Most importantly, the storage devices used by the network computers for the bandwidth-intensive operations are attached to the SAN rather than the LAN. Thus, when the bandwidth-intensive operations are required, they take place over the SAN and the LAN remains unaffected.
CommVault's proprietary DataPipe™ mechanism further described in U.S. Pat. No. 6,418,478 is used with a SAN to further reduce bandwidth constraints. The DataPipe™ is the transport protocol used to facilitate and optimize electronic data transfers taking place over a Storage Area Network (SAN) as opposed to those taking place over a LAN using NAS.
None of these solutions, however, address the static relationship between individual storage controller computers and individual storage devices. LANs, WANs, and even SANs using a DataPipe™ all require a static relationship between storage controller computer and storage device since each storage device on the network is uniquely owned by a storage controller computer. As discussed, when a storage device in this traditional architecture is assigned to a storage controller computer, that storage controller computer owns the device indefinitely and to the exclusion of other computers on the network. This is also true with both logical and physical storage volumes. One computer cannot control the drive pool and media group being that is controlled by another. Requests to store and retrieve data from such a drive pool and media group would have to first pass through the controlling computer. Such a static relationship between storage controller computer and storage device often leads to an inefficient use of resources.
For example, if each storage controller computer needs access to two storage devices and there are five storage controller computers in the network, then a total of ten storage devices will be required. The actual amount of work each of the ten storage devices performs might be much less than the workload capacity of each storage device. Such underutilization of storage device resources cannot be solved when a static relationship is required between storage device and storage controller computer.
If the static relationship were dynamic, however, and storage controller computers could actually share devices, then this underutilization can be addressed. Assuming in the above example that each of the five storage controller computers only uses ten percent of each device's workload capacity, then if all the storage controller computers could actually share the same two storage devices, eight of the storage devices could be eliminated without loss of performance or capability.
Furthermore, none of these existing solutions provide access to storage devices in the event of a storage controller failure. For example, if a storage controller computer were unavailable due to a hardware or software malfunction, then other computers on the network would not be able to access data stored on any storage device associated with the storage controller computer. Until the storage controller computer was brought back online, the data contained on any associated storage device would be effectively unrecoverable. If the association between the storage controller computer and a storage device were not static, however, then another storage controller computer could bypass the unavailable storage controller computer and access the storage device to retrieve the data.
There is thus also a need for a system which enables dynamic association of storage controller computers and storage devices