Storage systems are commonly configured into Storage Area Networks (SANs) to facilitate connection of multiple server hosts on a network, enabling shared access to various connected storage resources.
A common protocol for transferring data between a host processor and one or more storage devices is the Small Computer Systems Interface (SCSI) protocol for example under UNIX(trademark) and Windows NT(trademark) operating systems. A host bus adapter (HBA) plugs into a bus slot on the server""s internal bus and connects a SCSI cable to the storage device or subsystem, thereby creating a connection between server and storage devices. The host bus adapter enables a host or server to function as an initiator that begins transfer of data to and from a target device. Traditionally the SCSI protocol supports a combination of initiators and targets on a common bus, although configurations are conventionally limited to one initiator due to a lack of shared device management capability of available operating systems and physical limits of cabling.
Attempts to share target devices such as SCSI tape drives among multiple initiators expose several difficulties in conventional SAN configurations. Typically, configurations with multiple shared targets are designed primarily for inclusion of only a single initiator. Attempts to expand the number of initiators in a configuration by various techniques, such as addition of a bridge or router, fail to solve problems of availability, data integrity, and performance.
Routers handle multiple initiator operations by queuing commands. Queuing enables processing of each initiator""s commands but creates timing problems imposed by the initiator""s assumption of target ownership. An initiator that fails to receive a response from a target within timing specifications typically responds to the timing violation by initiating error recovery operations. In turn, the error recovery operations may affect data transfer of another initiator. Near simultaneous data transfer requests by multiple initiators can instigate multiple recovery loops and multiple failed backup/restore operations that result in loss of availability of the target resource to a user or subscriber.
Queuing can also cause data integrity errors in a multiple-initiator configuration. An initiator that begins a data transfer command sequence to a tape drive operates on a presumption of specific state information concerning the drive, including media position. These presumptions are invalidated when queuing interleaves commands from different initiators. Command interleaving has the potential to change the drive""s state, causing data transfer failure and possibly data corruption on the tape.
Queuing can also negatively impact performance. Interleaved commands that change drive state can disrupt performance by delays incurred while returning to the appropriate state. Even interleaved commands that do not change device state, such as inquiry and log sense commands, can potentially impact performance for tape drives having an optimization to operate in a faster mode for an uninterrupted sequence of data transfer commands.
Difficulties raised by multiple initiators in a Storage Area Network are addressed using various storage device management methods. Conventional management methods include access controls, switch zoning, SCSI reserve/release commands from initiators, and inquiry caching. Other methods include custom target reset handling in a router, dual initiator identifiers, imposing a requirement for homogeneous backup applications, and usage of management processes to manually protect a tape drive from tape resource requests by other activity during a backup/restore window. None of these methods is a general solution to all difficulties involved in usage of multiple initiators in a SAN. Predominantly, the improvement techniques were created and designed to solve other problems but have been found to have some utility in improving multiple initiator difficulties in some applications. Some methods, for example inquiry caching, address a common cause of availability problems, but fail to address other multiple-initiator difficulties. Even combinations of the various techniques are ineffective in handling the multiple initiator difficulties and generally only have utility in solving problems in special cases.
Management access controls enable enterprises to restrict management service access to a specific set of end points, for example IP addresses, device ports, or switch World Wide Numbers (WWNs). Access controls are typically implemented in router firmware and restrict access to devices behind the router to specified initiators. Access controls can disable front-panel access to switches, and manage device and switch connections. Device Connection Controls (DCCs) such as WWN Access Control Lists (ACLs) or Port ACLs enable binding of individual device ports to a set of one or more switch ports. Device ports are specified by WWN and typically represent Host Bus Adaptors (HBAs), also called servers. DCCs secure server-to-fabric connections to normal operations and management functions. DCCs bind a specific WWN to a specific switch port or set of ports to prevent a port in another physical location from assuming the identify of a WWN, controlling shared switch environments by enabling only an authorized set of WWNs to access particular ports in a fabric. Switch Connection Controls (SCCs) restrict fabric connections to a WWN-designated set of switches that are mutually authenticated for switch-to-switch connectivity, for example using digital certificates and unique public/private keying.
Access controls are generally useful to limit tape drive access to backup servers, blocking access to all other servers on the SAN. Access controls fail to address availability, data integrity, and performance issues because SANs can contain multiple backup servers and thus have multiple initiators.
Switch zoning is typically a feature implemented in switch firmware that is commonly used to restrict access to a router and library devices connected to the router to initiators at specified switch ports. Switch zoning is a SAN-partitioning technique that narrows traffic through a storage-networking device so that specific ports on a switch or hub can only access other specific ports. Switch zoning uses masking to the node port level for nodes that are accessible by a switch. Logical Unit Numbers (LUNs) attached to a port node can be masked from hosts that do not access that port. Switch zoning cannot mask individual LUNs arranged behind a port. Instead all hosts connected to the same port can access all LUNs address through that port. In essence, switch zoning converts the physical topology of a network to a logical representation consisting of multiple separate networks.
Fabric switches require any node that attaches to a switch to log in to the switch and register the node""s World Wide Number (WWN) in the Simple Name Server (SNS) function of the switch, assigning a unique address to the WWN. Host drivers can detect targets through SNS lookup rather than surveying the entire network. The SNS can be zoned by WWN or by port. WWN zoning facilitates dynamic changes to suit conditions. For example, a tape library can be moved to different zones at various times to restrict access during backup. Also a node can be moved to a different port address without changing zones using WWN zoning.
Unfortunately, switch zoning can lead to a security breach by unauthorized usage of a WWN. Another difficulty is that switch zoning supports initators and targets that attach to a switch and does not assist security beyond port level of a storage subsystem. Switch zoning cannot mask LUNs from initiators that access the same storage port.
Switch zoning does not address issues of availability, data integrity, and performance because multiple initiators can still be zoned to access the library. Switch zoning may also restrict flexibility of the SAN by limiting the backup servers that can access backup devices. Switch zoning is typically difficult to manage and does not scale well as the SAN grows.
SCSI reserve/release commands directed to a SCSI switch can be used to share peripheral devices between two host computers. Either an operating system or backup application can issue reserve/release commands to reserve library devices for a single initiator. Generally the tape device and the router manage the reservation. Reservation of a peripheral device, such as a tape drive, using the SCSI reserve/release command, causes attempts to access the peripheral from other initiators to be rejected until the first initiator releases the peripheral device. Reserve/release commands typically have limited utility due to interoperability difficulties between different backup applications and different versions of the same backup application, and inconsistencies that result when servers use different operating systems. In addition to interoperability flaws, reserve/release commands can create problems when a server holding a reservation loses power or is rebooted, leaving the reserved device unavailable. Other difficulties include omission of reserve/release support in some operating systems. Furthermore, SCSI buses of two hosts cannot always be connected so that reserve/release support is unavailable.
An inquiry cache is a cache of inquiry responses allocated for individual devices. Inquiry caching generally runs in a router. The inquiry cache is used to establish connections between an initiator and a peripheral device, such as a tape drive. The cache for the peripheral devices holds requests from initiators and allocates access to the initiators. Inquiry caching is effective when a secondary server issues inquiry commands. However, inquiry caching is ineffective in facilitating availability, data integrity, and performance when secondary servers issue other commands, such as log sense.
Some systems use dual initiator identifiers, typically executing in a router, to attain availability via redundancy. For example, Random Arrays of Inexpensive Disk (RAID) systems with multiple host or dual-loop capabilities seek fault resilience by enabling access by multiple hubs or switch ports. Likewise, host adapters or initiators can attach to multiple switches or hubs for improved performance and redundancy. Systems can be arranged in dual-initiator/dual-bus or dual-initiator/dual-loop configurations. The dual-initiator/dual-bus configuration gives high availability by protecting against host failure. For example, each host can have two host bus adapters, each connected by a fibre channel loop to a separate peripheral in the storage system. Dual initiator identifier (IDs) are effective when the tape drive can handle untagged queued commands, but has been ineffective in that tape drives generally do not support untagged queuing.
Homogeneous backup applications typically require that all backup operations using a library are the same, regardless of initiator operating system. With or without the reserve/release command, a homogeneous application enables multiple initiators to intercommunicate, usually through a network interface, and coordinate tape device sharing. Although some systems may specify that backup applications are homogeneous, difficulties with availability, data integrity, and performance are not resolved. For example, if an administrator or script issues commands during a backup window that changes drive state, homogeneity is breached.
Some applications use management processes to manually ensure no other initiator conducts operations to a tape drive during a backup/restore window, effectively handling availability, data integrity, and performance issues if achievable. However, usage of management processes is difficult since protection of tape drive operations is difficult to scale well as the SAN changes or increases in size.
In accordance with an embodiment of a system capable of managing access to a physical device from among a plurality of initiators, an apparatus comprises a data path capable of coupling a physical device to a plurality of initiators. An interface is coupled to the data path and forms a command pathway between the plurality of initiators and the physical device. A controller is coupled to the data path and coupled to the interface. The controller comprises an executable process that creates a virtual device object that resolves conflicting concurrent attempts to access the physical device by a plurality of initiators. The virtual device object is capable of protecting state of the physical device during successive data transfer and media movement operations by emulating responses of the physical device and redirecting access to the physical device when the physical device becomes available.
In according with another embodiment, a system capable of managing access to a physical device from among a plurality of initiators comprises a virtual device capable of emulating at least one behavior of the physical device, a command filter, and a monitor. The command filter is capable of communicating with the plurality of initiators and selectively directing initiator requests to the physical device and the virtual device based on physical device state. The monitor is coupled to the command filter and the physical device and capable of determining state of the physical device and communicating a physical device state signal to the command filter.
In accordance with other embodiments, a system capable of managing traffic on a data path between a physical device and a plurality of initiators comprises a virtual device, a monitor, and a command filter. The virtual device is capable of emulating at least one behavior of the physical device. The monitor is coupled to the data path and is capable of analyzing multiple conditions in a background process that extracts a physical device state parameter from the multiple conditions. The physical device state parameter directs filtering of the command filter. The command filter is capable of alternatively directing commands to the virtual device and the physical device based on the physical device state parameter.
In accordance with further embodiments, a method of managing access of a plurality of initiators to a physical device comprises monitoring requests from the plurality of initiators to the physical device and status of the physical device. The method further comprises determining whether the physical device is bound to one of the plurality of initiators, creating a virtual device that emulates at least one action of the physical device, and directing a request from an unbound initiator to the virtual device if the physical device is bound.