This invention relates to tape data storage systems and, in particular, to a plurality of tape devices which are connected to a plurality of data processors via a network and which collectively implement a virtual, distributed tape data storage subsystem. The virtual, distributed tape data storage subsystem realizes multiple virtual devices, which are available to any of the data processors via the network which has a scalable bandwidth and can be changed on demand.
It is a problem in the field of data storage subsystems to provide adequate data storage service to the data processors that are connected to the data storage devices. There are numerous data storage media in use as well as corresponding data storage subsystem configurations which attempt to improve the data storage capabilities of the data storage media that is used to implement the data storage devices. For example, increases in the area density of data storage products translate into lower data storage costs per bit, but do not always yield higher data transfer rates. To achieve increased data transfer rates requires architectural approaches to data storage rather than data storage device improvements. One other aspect of this data storage problem is that the allocation of customer data to a single type of data storage media represents a limitation when faced with. widely varying data storage needs. This limitation can be partly obviated by balancing I/O activity across an array of data storage devices of a data storage subsystem. However, a fixed array configuration of data storage devices also limits the scalability in performance and provides no facility for applications to request changes in performance. An architecture where the data storage devices are located behind a server further limits the delivered performance since the bandwidth is limited by the server itself. Therefore, architecting a data storage subsystem that can efficiently serve the needs of the applications extant on the data processors is a daunting problem. There are numerous factors which effect performance and this problem is particularly pertinent to tape devices, since the tape media is experiencing significant enhancements to its data storage capacity.
The traditional tape device is directly connected to a single data processor in a dedicated tape device configuration. The data processor has exclusive use of the tape device and typically communicates with the tape device via a SCSI interface. However, the use of dedicated tape devices is an expensive proposition where there are a plurality of data processors to be served, especially if the data access loads generated by the plurality of data processors are erratic. In this data storage subsystem architecture, the utilization of the tape devices and the efficiency of the data storage function are less than optimal, since each data processor is limited to its dedicated tape device and its physical constraints.
An alternative data storage subsystem architecture is to connect a plurality of tape devices along with a plurality of data processors to a common data communication network. In this architecture, the data processors all have access to all of the tape devices. The data processors run tape server software to manage the access protocol for the plurality of tape devices. Among the problems with the network interconnected tape devices is that it is difficult to share a tape device among a plurality of data processors. To provide enhanced response time, the tape devices can be served by an automated tape cartridge library system which mounts/dismounts the tape cartridges for the plurality of tape devices served by the automated tape cartridge library system. However, the tape cartridge library systems typically have a SCSI interface in the data path, and the SCSI interface introduces a number of physical limitations to the operation of the automated tape cartridge library system. The first limitation is that only a small number of tape devices can be attached to a SCSI bus compared to other bus architectures. The second limitation is the limited bandwidth of the SCSI bus that is shared by these tape devices. The length of the SCSI cable may also represent an additional limitation, since the length of the SCSI bus is typically limited to 25 feet.
A variation of this network data storage architecture is the use of a plurality of tape devices configured into a tape array. The tape devices are configured in a redundant array of data storage devices in a manner analogous to the Redundant Array of Inexpensive Disks (RAID) which is a well known architecture in the disk device technology. The tape array is typically located behind a server, which is directly connected to the network, and the bandwidth for data transfers between the data processors and the tape array is not scalable and is also limited by the characteristics of the server. The tape array itself is also not scalable or easily changed due to the server limitation.
These various tape device based data storage subsystem architectures are all limited in their data exchange bandwidth and are restricted by the use of a single data storage media. The alternative to a single media data storage subsystem is the integration of a plurality of data storage media types and data storage subsystem architectures into a single data storage subsystem, typically termed a mass storage subsystem. One such data storage subsystem designed to address this problem is the mass storage system described in the paper by Sam Coleman and Steve Miller titled xe2x80x9cMass Storage System Reference Model: Version 4xe2x80x9d, published May 1990 by the IEEE Technical Committee on Mass Storage Systems and Technology. This mass storage system interconnects a plurality of data processors with a diversity of data storage subsystems via a high bandwidth switched network for the transmission of data therebetween at high data transfer rates. A separate network is used to interconnect the data processors with the mass storage system controller, which manages the processing of data transfer requests received from the data processors over the control network. The mass storage system controller is directly connected to the controllers of the various data storage subsystems and transmits data file retrieval requests to the selected controller in response to the received data file request received from the data processors. The file staging process used therein copies a data file in its entirety from the mass storage system to the client data processor via the high bandwidth network before the user accesses any of the requested data. Usually the data file is staged from an archival storage device, but staging from a direct access storage device is possible as well. File staging is initiated by a client data processor transmitting a request to the mass storage system identifying the data file by name. The mass storage system maintains mapping information indicative of the physical memory storage location occupied by the requested data file. The retrieved mapping information is used by the mass storage system to transmit a file retrieval request to the archival storage device on which the requested data file is stored. Upon receipt of this request, the designated storage device copies the requested data file over the high speed network to a local, direct access data storage device that serves the requesting client data processor.
A significant limitation of this architecture is that the data is managed on a data file basis. Each client data processor request for a data file causes the mass storage system to access the mapping tables to locate the requested data file. When a client data processor sequentially accesses a plurality of data files, the mass storage system must successively access the mapping tables to identify each requested data file. As the extent of the mass storage system data storage capacity increases, the size and extent of the mapping tables proportionately increases and the time required to retrieve mapping information for each data file becomes a significant performance limitation of the mass storage system.
An improvement to the mass storage system is disclosed in the paper by J. L. Sloan et al, titled xe2x80x9cMaSSIVE(trademark): The Mass Storage System IV Enterprisexe2x80x9d, published April 1993 in the Proceedings of the IEEE, Vol. 81, No 4 and also disclosed in U.S. Pat. No. 5,566,331. The MaSSIVE mass storage system stages entire file-systems as bit files between archival storage devices and direct access storage devices. Because these direct access storage devices are channel-attached to the client data processor, the file-systems contained thereon may be efficiently accessed by the client data processors exactly as if they were located on local data storage devices. Since entire file-systems are staged rather than individual data files, staging becomes less convenient to the user since multiple, associated file sets are staged together. On the other hand, by staging whole file-systems without interpretation to a storage device which is channel-attached to the client data processor, the inefficiencies and bottlenecks of network file service are avoided. Thus, this mass storage system design combines the benefits of file staging and network file service, but at the same time minimizes the drawbacks of each.
Thus, the use of a plurality of different types of media in a mass storage subsystem presents its own set of problems. With advances in massively parallel processing, there is a need for interconnection networks that provide high bandwidth inter-process communications and data storage configurations that match the high bandwidth network capacity. There is also a need for data storage devices that can scale in capability or capacity of data storage and data transfer throughput. These data storage devices should be accessible via a high bandwidth switched network, such that a plurality of paths and links are provided between the data processor and the data storage devices. The data storage devices should be directly connectible to the switching network and not captive behind a server to thereby increase data throughput. The data transfers should be operational without the need to expend processing resources and the data storage devices should be shared among multiple data processors. The above-noted mass storage systems address the issue of utilizing the high bandwidth network that interconnects the data processors with the data storage subsystems to a high degree of efficiency. However, the data storage subsystems described therein represent archival data storage subsystems and are not efficiently used for simple routine data file access. In addition, the data storage image of the various data storage subsystems are immutable and the mass storage system controller simply functions as a data transfer manager to ensure that the data files or file systems are relocated from the archival data storage subsystem to the local data storage maintained by the data processors. There is no attempt to address the issue of the local storage element used by the data processors. Thus, the local storage systems described in the mass storage system publications suffer the limitation described above with respect to the tape devices.
The above described problems are solved and a technical advance achieved in the field by the network attached virtual tape storage subsystem of the present invention. This invention relates to tape data storage systems and, in particular, to a plurality of tape devices which are connected to a plurality of data processors via a high bandwidth switching network and which collectively implement a virtual, distributed tape data storage subsystem. The virtual, distributed tape data storage system incorporates elements from the mass storage system technology as well as tape device data storage subsystem architectures to realize multiple virtual devices, which are available to any of the data processors and the bandwidth of the system is scalable and can be changed on demand. By pooling the tape devices together, and interconnecting them with the data processors via a fiber data network, the problems of prior art tape device data storage subsystem architectures are overcome. This architecture realizes multiple virtual devices, which are available to any of the data processors. This architecture enables any tape array to be realized, including, but not limited to: RAIT 0, 1, 3, 4, 5, and the tape array and data transmission bandwidth can be dynamically reconfigured, since the network switchably interconnects the tape devices to the data processors.
The virtual tape storage subsystem is managed by a system controller which contains a plurality of software elements including: resource allocation, resource configuration, and resource management. The system controller may also contain security software for authentication. The resource allocation software has the responsibility to keep track of the resource usage such as which data processor presently owns the tape device, which tape devices are free to be allocated to the requesting data processors and the like. The resource configuration software allows an operator to configure the data storage resources for the data processors which are attached to the network. The operator can assign the maximum number of tape devices that a data processor can designate the configuration of these tape devices. The resource configuration software automatically configures both the tape devices allocated to a data processor as well as the connection between the data processor and a tape device(s). The resource management software queues the request for the resource allocation and notifies the data processor when the requested resource is ready, or it can schedule the availability of the resources.
The use of a networked storage manager enables the tape devices to be managed as a pool and yet attach the tape devices directly to the network as individual resources. The networked storage manager must provide the mechanism for the enterprise management to control tape device allocation and configuration as well as other functions, such as tape cartridge movement and data migration. The rules which could be implemented address response time constraints, data file transfer size, data file transfer rates, data file size bounds and the like. The networked storage manager manages the allocation, configuration and security.