Networked attached storage (NAS) and storage area networks (SANs) are two recent technologies that attempt to allow computers to access network-connected hard disk drives and other mass storage devices using block-level commands so that the networked storage appears to be accessed as if it were physically attached to the workstation. In a NAS, the storage device connects directly to the network medium and does not require an intermediate server to provide access to the storage. In a SAN, a separate network of storage devices forms storage space that is allocated to different workstations and this separate network is itself connected to the network medium, which connects the different workstations.
Conventional SANs do not perfectly solve all the mass storage needs for an enterprise. In particular, maintenance and provisioning of the storage space within the conventional SAN is difficult to accomplish and wasteful of the physical resources. To address these concerns, many recent developments in this field have involved virtualizing the storage space so that there is little, or no, correlation between the physical disk drive devices where the data actually resides and the logical disk drive devices which are the targets for a workstation's data access request. One such currently produced product that is known in the industry and provides a substantially virtualized view of the storage space within a SAN is the MAGNITUDE™ SAN manufactured by Xiotech Corporation of Eden Prairie, Minn.
The MAGNITUDE™ SAN aggregates physical drives into a centralized “virtualized” storage pool and has the ability to stripe across and utilize all available space in a centralized storage pool. From this pool, a user carves out storage into “virtualized disks” and assigns that storage to whichever workstation that needs it. Within the SAN, the workstations see the MAGNITUDE™ SAN's virtual disks as Logical Unit Numbers (LUNs). Within MAGNITUDE™ SAN, virtualization refers to different levels of logical constructs rather than to physical storage devices (e.g. SCSI hard disk drives).
The MAGNITUDE™ SAN is responsible for presenting the available virtualized disks as addressable devices on the Fibre Channel fabric. As a result, remote servers and workstations need only generate a typical block-level command (e.g., SCSI-3 command) to access blocks on an available logical drive. The MAGNITUDE™ SAN, however, receives this conventional protocol request and converts it into a virtual request packet (VRP) for internal processing. The MAGNITUDE™ SAN internally unencapsulates, parses and processes a VRP message utilizing translation tables in order to eventually generate, for example, SCSI commands to access multiple SCSI devices. The MAGNITUDE™ SAN enforces access controls at the virtualized disk level. Individual virtualized disks can be assigned to a specific workstation to allow the workstation and its storage to be isolated from another workstation and its storage.
Within the MAGNITUDE™ SAN system, for example, there is at least one controller having at least one processor, memory, and support circuits for presenting storage space to the servers by directing and controlling access to the disk storage subsystem. The controller also includes firmware, that when executed by the processor, performs many levels of translations needed to permit receiving a request involving a virtualized drive and actually performing data accesses to multiple physical devices. In particular, the servers send data access requests (e.g., read/write commands) to the controller directed to a particular logical disk drive and the controller translates the request into commands that access data on the physical drives.
As with any complex products, hardware and/or software component failures may occur that typically inconvenience the users of such products. Such failures may simply be “soft” failures that cause a temporary disruption (i.e., “glitch”) or in a worst-case scenario, “hard” failures that cause server outages and network downtime. Soft failures include software or firmware glitches, such as being caught in a software loop, or hardware glitches, such as a temporary loss or degradation of a signal to a component (e.g., IC). On the other hand, hard failures include, for example, corrupted software or a degradation of hardware components to the extent that performance is unacceptable or non-operational.
Hard failures are usually not recoverable by simply reinitializing the system. Rather, the system usually needs to be powered down, the failed component is replaced, and the system is then reinitialized. Soft failures, on the other hand, are usually administered by initially reinitializing the system, prior to isolating the failure to a specific component.
However, reinitialization, for example, of a SAN system may take several minutes, since the servers must be powered down during the process. Such extended downtime is inconvenient to the users of the system, since they are denied access to their applications and data for prolonged periods. Therefore, there is a need in the art for improved fault recovery, as well as reducing the downtime of a SAN system resulting from such faults.