1. Field of Invention
The present invention relates to a method for performing host-side IO rerouting in a redundant storage virtualization subsystem.
2. Description of Related Art
Storage virtualization is a technology that has been used to virtualize physical storage by combining sections of physical storage devices (PSDs) into logical storage entities, herein referred to as logical media units which will be explained later in more detail, that are made accessible to a host system. This technology has been used primarily in redundant arrays of independent disks (RAID) storage virtualization, which combines smaller physical storage devices into larger, fault tolerant, higher performance logical media units via RAID technology.
A logical media unit, abbreviated LMU, is a storage entity whose individual storage elements (e.g., storage blocks) are uniquely addressable by a logical storage address. One common example of a LMU is the presentation of the physical storage of a HDD to a host over the host IO-device interconnect. In this case, while on the physical level, the HDD is divided up into cylinders, heads and sectors, what is presented to the host is a contiguous set of storage blocks (sectors) addressed by a single logical block address. Another example is the presentation of a storage tape to a host over the host IO-device interconnect.
A Storage virtualization Controller, abbreviated SVC, is a device the primary purpose of which is to map combinations of sections of physical storage media to LMUs visible to a host system. IO requests received from the host system are parsed and interpreted and associated operations and data are translated into physical storage device IO requests. This process may be indirect with operations cached, delayed (e.g., write-back), anticipated (read-ahead), grouped, etc. to improve performance and other operational characteristics so that a host IO request may not necessarily result directly in physical storage device IO requests in a one-to-one fashion.
An External (sometimes referred to as “Stand-alone”) Storage Virtualization Controller is a Storage Virtualization Controller that connects to the host system via an IO interface and that is capable of supporting connection to devices that reside external to the host system and, otherwise, operates independently of the host.
One example of an external Storage Virtualization Controller is an external, or stand-alone, direct-access RAID controller. A RAID controller combines sections on one or multiple physical storage devices (PSDs), the combination of which is determined by the nature of a particular RAID level, to form LMUs that are contiguously addressable by a host system to which the LMU is made available. A single RAID controller will typically support multiple RAID levels so that different LMUs may consist of sections of PSDs combined in different ways by virtue of the different RAID levels that characterize the different units.
Another example of an external Storage Virtualization Controller is a JBOD emulation controller. A JBOD, short for “Just a Bunch of Drives”, is a set of PSDs that connect directly to a host system via one or more a multiple-device IO device interconnect channels. PSDs that implement point-to-point IO device interconnects to connect to the host system (e.g., Parallel ATA HDDs, Serial ATA HDDs, etc.) cannot be directly combined to form a “JBOD” system as defined above for they do not allow the connection of multiple devices directly to the IO device channel.
Another example of an external Storage Virtualization Controller is a controller for an external tape backup subsystem.
A Storage Virtualization Subsystem consists of one or more above-mentioned SVCs or external SVCs, and at least one PSD connected thereto to provide storage therefor.
A redundant SVS is a SVS comprising two or more SVCs configured redundantly. The primary motivation in configuring a pair of Storage Virtualization Controllers into a redundant pair is to allow continued, uninterrupted access to data by the host even in the event of a malfunction or failure of a single SVC. This is accomplished by incorporating functionality into the SVCs that allow one controller to take over for the other in the event that the other becomes handicapped or completely incapacitated. On the device side, this requires that both controllers are able to engage access to all of the PSDs that are being managed by the SVCs, no matter by which SVC any given PSD may be initially assigned to be managed. On the host side, this requires that each SVC have the ability to present and make available all accessible resources to the host, including those that were originally assigned to be managed by the alternate SVC, in the event that its mate does not initially come on line or goes off line at some point (e.g., due to a malfunction/failure, maintenance operation, etc.).
A typical device-side implementation of this would be one in which device-side IO device interconnects are of the multiple-initiator, multiple-device kind, and all device-side IO device interconnects are connected to both SVCs such that either SVC can access any PSD connected on a device-side IO device interconnect. When both SVCs are on-line and operational, each PSD would be managed by one or the other SVC, typically determined by user setting or configuration. As an example, all member PSDs of a LMU that consists of a RAID combination of PSDs would be managed by the particular SVC to which the LMU itself is assigned.
A typical host-side implementation would consist of multiple-device IO device interconnects to which the host(s) and both SVCs are connected and, for each interconnect, each SVC would present its own unique set of device IDs, to which sections of LMUs are mapped. If a particular SVC does not come on line or goes off line, the on-line SVC presents both sets of device IDs on the host-side interconnect, it's own set together with the set normally assigned to it's mate, and maps sections of LMUs to these IDs in the identical way they are mapped when both SVCs are on-line and fully operational. In this kind of implementation, no special functionality on the part of the host that switches over from one device/path to another is required to maintain access to all sections of LMUs in the event that a SVC is not on-line or goes off line. This kind of implementation is commonly referred to as “transparent” redundancy.
Redundant SVC configurations are typically divided into two categories. The first is “active-standby” in which one SVC is presenting, managing and processing all IO requests for all LMUs in the Storage Virtualization Subsystem (abbreviated SV subsystem or SVS) while the other SVC simply stands by ready to take over in the event that the active SVC becomes handicapped or incapacitated. The second is “active-active” in which both SVCs are presenting, managing and processing IO requests for the various LMUs that are present in the SVS concurrently. In active-active configurations, both SVCs are always ready to take over for the other in the event that it malfunctions causing it to become handicapped or incapacitated. Active-active configurations typically provide better levels of performance because the resources of both SVCs (e.g., CPU time, internal bus bandwidth, etc) can be brought to bear in servicing IO requests rather than the resources of only one.
Another essential element of a redundant SV subsystem is the ability for each SVC to monitor the status of the other. Typically, this would be accomplished by implementing inter-controller communications channels (abbreviated ICC channel) between the two SVCs over which they can exchange operating status. These communications channels may be dedicated, the sole function of which is to exchange parameters and data relating to the operation of the redundant SV subsystem, or they could be one or more of the IO device interconnects, host-side or device-side, over which operational parameter and data exchange is multiplexed together with host-SVC or device-SVC IO-request-associated data on these interconnects. They could also be a combination of dedicated and multiplexed interconnects.
Yet another important element of a redundant SV subsystem is the ability of one SVC to completely incapacitate the other so that it can completely take over for the other SVC without interference. For example, for the surviving SVC to take on the identity of it's mate, it may need to take on the device IDs that the SVC going off line originally presented on the host-side IO device interconnect, which, in turn, requires that the SVC going off line relinquish its control over those IDs. This “incapacitation” is typically accomplished by the assertion of reset signal lines on the controller being taken off line bringing all externally connected signal lines to a pre-defined state that eliminates the possibility of interference with the surviving SVC. Interconnecting reset lines between the SVCs so that one can reset the other in this event is one common way of achieving this. Another way to accomplish this is to build in the ability of an SVC to detect when itself may be malfunctioning and “kill” itself by asserting its own reset signals (e.g., inclusion of a “watchdog” timer that will assert a reset signal should the program running on the SVC fail to poll it within a predefined interval), bringing all externally connected signal lines to a pre-defined state that eliminates the possibility of interference with the surviving SVC.