Conventional computer systems, particularly computer systems that operate as server systems, frequently provide a plurality of hard disk drive storage devices configured as an array. Frequently, the array of disk drives is configured using one of the conventional RAID techniques to enhance data reliability. Several levels of RAID (for example, RAID 1 through RAID 5) are known in the art and not described here in detail. These RAID disks are conventionally disposed mechanically and electrically in a protective housing which provides mechanical mounting, power supply, cooling (typically with one or more rotating fans), interface connections to the host computer via a host adapter often through a pluggable SCSI connector, and information display means such as colored LEDs, Liquid Crystal Display (LCD) to provide status information, audible alarms, and the like.
Frequently, the RAID disk drive array is disposed in one or more enclosures that meet the SCSI Accessed Fault-Tolerant Enclosures (SAF-TE) Interface Specification. The objective of this Interface Specification is to provide a standard way for disk and disk controllers (especially for RAID controllers) to be automatically integrated with peripheral packaging to support status signals (including LEDs, LCD displays, audible alarms, temperature sensing, etc.), hot swapping of hard disk drives, and monitoring of components within the disk drive enclosure. Standardization permits a system vendor to integrate alternative third party controllers, disk drives, and peripheral packaging knowing that they will operate in a predictable manner and knowing that a selected controller will work with a variety of components that comply with the standard either at the time of initial integration or later during product revision or upgrade. Revision 1.0 (17 Oct. 1995) of this SAF-TE Interface Specification and the SAF-TE Addendum Sheet Updated 11 Jul. 1996 and incorporated herein by reference.
The SAF-TE standard is currently implemented on a SCSI microprocessor device, and SCSI provides the underlying transport mechanism for communicating enclosure information so that standard SCSI host adapters will work in the SAF-TE environment and no special considerations such as reserved signals on the SCSI bus need be anticipated. So called “target devices” that implement the SAF-TE Interface Specification are collectively referred to as the SAF-TE Processor (SEP) device. In the SAF-TE context, all communication is initiated by the host and the SAF-TE Processor device acts only in a target role. The current version of the SAF-TE interface specification is implemented on a SCSI microprocessor device and the SAF-TE Processor device should conform to the ANSI SCSI-2 specification for processor devices. (The ANSI SCSI-2 Specification Version is hereby incorporated by reference.)
A brief description of selected SCSI Commands and Messages is provided so that the context of the invention may be understood more readily; however, additional details are available in the afore referenced SAF-TE Interface and ANSI SCSI-2 specifications.
SAF-TE conventionally supports six SCSI commands: WRITE BUFFER, READ BUFFER, INQUIRY, TEST UNIT READY, SEND DIAGNOSTIC, and REQUEST SENSE. Receipt by the SEP devices of a command with any other operation code (opcode) will be interpreted as an Invalid CDB Operation Code and result in a Check Condition within a SAF-TE.
The SAF-TE is a polling based interface, and while the SAF-TE Interface Specification does not place any formal restriction on polling frequency, the specification states that it expects most implementors to poll the SEP once every two to ten seconds. The specification also recommends that the maximum response time of the SEP device to any WRITE BUFFER or READ BUFFER command should be less than two milliseconds, and that the maximum recovery time of the SEP device from a SCSI bus reset should be 30 milliseconds.
A SCSI Message phase allows informational messages to be exchanged between an initiator (e.g. the host) and a target (e.g. the SEP). SCSI Messages supported include ABORT, BUS DEVICE RESET, COMMAND COMPLETE, IDENTIFY, INITIATOR DETECTED ERROR, MESSAGE PARITY ERROR, MESSAGE REJECT, and NO OPERATION. These messages are described in greater detail in the SCSI-2 Specification.
While the drafters of the SAF-TE interface specification suggested in 1995 that minimal performance impact was to be expected due to the short duration (for example, a few milliseconds) and low frequency (for example, from about two to about ten second interval) of the polling, the impact is significant when the polling cycle time is up to 100 ms or longer in some implementations and has become more severe as the initial response time of the polling of the SAF-TE unit-ready status is longer than expected and the number of required SAF-TE enclosures for each SCSI bus or channel grows larger. The increase in processor speeds and the increases and demands on network servers have also made the problem worse. For example, the effect on a typical 1995 RAID configuration (e.g. two SCSI channels, each channel having one SAF-TE enclosure) was typically less than two percent [(100 ms/channel)×(2 channels)÷10000 ms=0.02]; while for a high performance RAID system in 1998–1999 (e.g. 3 to 4 SCSI channels, each channel having three SAF-TE enclosures) the impact is in the nine- to twelve-percent range [3×(100 ms/SAFTE)×(3 SAFTE)÷10000 ms=0.09]. Therefore, there is a need for a structure and method that can reduce the impact of polling. At the Ultra-2 speed of 80 MB/sec, about 26 Gigabyte (GB) of additional data can be received in one hour using the bandwidth freed by conventional SAF-TE polling. It is highly desirable for applications such as video-on-demand and video or other image streaming to have this additional capacity. One industry study has shown that on a Digital Equipment Corporation (DEC) video server using 200 MPEG-2 (420 KB/sec) video streams, a two-hour video movie requires the transfer of 4.2 GB of data. Therefore, the ability to transfer approximately 52 Gigabyte (GB) of additional data over a two hour time period is enough to communicate twelve additional movies using the same equipment, or to meet the same need using less equipment.
The SAF-TE interface is now briefly described The SAF-TE interface is a set of processor device commands that the host system may use to request specific actions of the target processor. These processor device commands fall into two general categories: (1) those commands that request some action to be performed in the enclosure are sent to the SEP device with a WRITE BUFFER operation, and (2) those commands that request information from the SEP device and are sent to the SEP device using a READ BUFFER command. As there are different types of conventional READ BUFFER and WRITE BUFFER data packets, each of these commands provides distinguishing single-byte opcodes to implement the desired functionality. Opcodes in the range of 00h to 7fh are reserved for standardized commands while opcodes in the 80h to FFh are open and available for vendor specific use under the standard. READ BUFFER and WRITE BUFFER commands are now summarized.
READ BUFFER commands include: Read Enclosure Configuration (00h), Read Enclosure Status (01h), Read Usage Statistics (02h), Read Device Insertions (03h), and Read Device Slot Status (04h). WRITE BUFFER commands include: Write Device Slot Status (10h), Set SCSI ID (11h), Perform Slot Operation (12h), Set Fan Speed (13h), Activate Power Supply (14h), and Send Global Command (15h). Some of the commands are mandatory under the SAF-TE standard while others are optional.
Conventionally, standard SCSI Host Adapters (HAs) including most (if not all) Redundant Array of Independent Disk (RAID) controllers communicate regularly with all attached SCSI Accessed Fault-Tolerant Enclosures (SAF-TE) using a well-defined protocol. Each SAF-TE supports status signals (LEDs, audible alarm, LCD, etc.), hot swapping of hard disk drives, and monitoring of all enclosure components.
All communication is initiated by the host. The SAF-TE's Processor (SEP) device acts only in a target role. Asynchronous Event Notification is not used. The SEP device is periodically polled by the host through the host adapter to detect changes in status.
Drive failure indications are controlled by the host (adapter) through a command set because it is the host that knows if a drive has failed. Status indicators for other components, such as fans and power supplies, is automatically controlled by the SEP device. Drive insertion in or removal from the slot is sensed by the SEP device first.
Although there is no specific restriction on SEP polling frequency, Host Adapters poll the SEP once every few seconds (generally 10 seconds or less) continuously. Depending on implementation, the time it takes to poll a SAF-TE enclosure each time including the time to execute a typical set of five commands that accomplish the polling. Empirical measurements have shown that the total execution time for available devices varies between about 60 millisec and about 250 millisec with a typical time of about 100 millisec. For example, these commands and a typical time for execution are as follows: “Test Unit Ready” (10 ms), “Read Buffer” for Device Slot Status (25 millisec), “Test Unit Ready” (10 millisec), “Read Buffer” for Enclosure Status (25 millisec), and “Write Buffer” for Device Slot Status (25 millisec).
With a current 16-bit wide SCSI bus (or referred to as SCSI channel), up to fourteen (14) drive slots (each having or configurable for a single disk drive) may be used. Although it is theoretically possible to have seven (7) single-slot enclosures for the entire bus, most commercially available SAF-TE enclosures provide 4 to 6 drive slots each. Three such enclosures may be used to share the entire bus. a SAF-TE enclosure with four to six slots is commercially desirable because of ease of packaging, and powering and cooling the devices within the enclosure; whereas an enclosure with seven (7) or fourteen (14) slots is not commercially desirable because there is only one disk drive in each of seven enclosures, and one fourteen-slot enclosure is in practical terms unmanageable in terms of power, cooling, and packaging.
Host adapters that control multiple (two, three, or potentially more) SCSI channels each are plentiful. Therefore, it is not unusual for a host adapter to poll nine (or more) SAF-TE enclosures (for example, three channels each having three SAF-TE enclosures which themselves have four disk drives each) for their status changes constantly when a significant number of drives are required, as is the trend of the ever-expanding storage subsystem used in contemporary client/server computing. A maximum of sixteen bits are available, three bits for the SEP, and twelve bits for the target ID's, so that three enclosures with four disks per enclosure may be supported. In FIG. 2 there is illustrated an exemplary configuration of a host 32 coupled for communication to Host Adapter (HA) 34, where Host Adapter 34 provides first and second SCSI bus 36, 38 connections to two sets of three SAF-TE enclosures 41 (41a, 41b, 41c, 41d, 41e, 41f).
In spite of the frequent polling of each SAF-TE, the changes that may be detected as a result of the polling (for example, changes in drive operational status, drive slot status, or SAF-TE component status) do not occur so frequently. In fact, in a system that has reached normal operating temperature as is operating normally and without intentional operator intervention (such as removal or insertion of a disk drive), no significant change will typically occur for periods of from weeks, to months or longer. The average mean-time between failures (MTBF) for a disk drive used in server applications is several thousand hours. Of course while some events are rare or infrequent (such as a device, cooling fan, or other component failure) it may generally be important to receive notification of that event quickly so that appropriate remedial action can be taken. Conventional SEP polling under the SAF-TE Interface standard as described above is largely non-productive and the SCSI bus bandwidth is poorly utilized and the host adapter resource is significantly misused.
Furthermore, in order to minimize system performance impact, maintaining the prescribed timing restrictions, such as the standard's requirement that maximum response time of the SEP device to any status-passing WRITE BUFFER or READ BUFFER command be kept below 2 milliseconds, are likely to increase the overall cost of SAF-TE hardware. Many systems do not meet this recommendation and may have a response time on the order of about 25 millisec. Hardware costs increase because a faster microprocessor and higher speed memory, and in some instances hard-wired logic, may be required.
The SCSI Accessed Fault-Tolerant Enclosures Interface Specification recommends that the host adapter continuously poll the SEP device once every 10 seconds or less. This polling is ordinarily performed by issuing a sequence of five commands (TEST UNIT READY, READ BUFFER, TEST UNIT READY, READ BUFFER, WRITE BUFFER) as listed in Table I with appropriate parameters or arguments and expecting immediate response (that is a response typically on the order of about 30 milliseconds or less) to the SEP without disconnection or tagged command queuing.
TABLE IList of SEP commands to implement SAF-TE RecommendedPeriodic PollingSCSI Command NameCommand Descriptor BlockRemarks(Each byte is shown in hex;byte 0 is on the left)TEST UNIT READY00 00 00 00 00READ BUFFER3c 01 04 00 00 00 00 00 40 00Read DeviceSlot StatuscommandTEST UNIT READY00 00 00 00 00READ BUFFER3c 01 01 00 00 00 00 00 80 00Read EnclosureStatuscommandWRITE BUFFER3b 01 00 00 00 00 00 00 nn 00Write DeviceSlot Statuscommand(where nn =No. slots ×3 + 1)
The SCSI READ BUFFER Command Descriptor Block (CDB) has the structure illustrated in Table I. The READ BUFFER command is used to receive a data packet from the SEP device in a DATA IN phase. These data packets are the method of transferring enclosure status information to the initiator (HA). These fields in the conventional READ BUFFER command are known in the art and we only address the contents of them briefly here. The 8-bit Operation Code (Opcode) field in byte 0 should hold 3Ch (3C hexadecimal); the 3-bit Logical Unit Number in byte 1 should hold 000b (000 binary); the two-bit Reserved field contains 00b; and the three-bit Mode field holds (001b or 01h) to indicate that the data buffer is in the SAF-TE command format. The Buffer ID byte determines the content and format of the data packet to be transferred to the initiator during the data phase. If set to 01 h, the CDB is a Read Enclosure Status command. If set to 04h, the CDB becomes a Read Device Slot Status command. Bytes 3–6 are each set to 00h (zero) to signify that the bytes are unused. Transfer Length (MSB, LSB) is the size of the data packet (in bytes) to be transferred in the data phase of this command. Byte 9 of the READ BUFFER CDB is set to 00h to signify that it is unused.
The SCSI WRITE BUFFER command descriptor block has a similar format except that the Op code is 3Bh, and that byte 2 is set to 00h instead of being used as Buffer ID. The contents of the other bytes depend on the particular type or mode of WRITE BUFFER operation, for example, whether a WRITE SEP DEVICE Command (Mode 01h) as illustrated in Table II or an UPLOAD FIRMWARE Command (Mode 04h) is issued. The structure of a WRITE BUFFER-Write SEP Device Command is illustrated in Table II, the structures of the WRITE BUFFER Command for other modes is known in the art and also described in the SCSI Accessed Fault-Tolerant Enclosures Specification and incorporated herein by reference.
The Read Enclosure Status command is used by the host to find the operational status of the components of the enclosure and causes the SEP device to transfer to the host adapter operational status information on components in the enclosure such as fans, power supplies, temperature sensors, temperature out of range indicators, door locks, SCSI ID mapping for drive slots, and the like information. The host is expected to pass the component status information to the user for corrective actions if necessary. Corrective action may include, for example, replacing the failing cooling fan, power supply, or disk drive. The existence and number of such components included in the enclosure are indicated by means of a command known as Read Enclosure Configuration command, another derivative of the READ BUFFER command issued by the HA at a power-on time or after a system reset.
The Read Device Slot Status command causes the SEP device to transfer for each slot, four bytes of drive and drive slot status information to the host adapter. The first three bytes of these four bytes are defined exactly the same as those for the Write Device Slot Status command and generally duplicate what is transmitted by the host adapter on the preceding Write Device Slot Status command for that slot, except upon powering up the host system. The fourth byte indicates whether a drive is inserted in the particular slot or not, whether the slot is ready for insertion/removal, and whether the slot is prepared for operation.
The Write Device Slot Status command causes the host adapter to transfer to the SEP device three bytes of drive status information for each device slot. That information includes the drive state and configuration setup, drive operational status, drive error conditions and the state of array, if any, in which the drive is a configured member. The state of the array may, for example, be normal, critical (where one member drive has failed), or off-line.
TABLE IStructure of READ BUFFER CDBBitsByte765432100Operation Code (3Ch)1Logical Unit NumberReserved (00h)Mode (01h)2Buffer ID 3(not used)600h7Transfer Length (MSB)8Transfer Length (LSB)900h
TABLE IIStructure of WRITE BUFFER - Write SEP Device CommandBitByte765432100Operation Code (3Bh)1Logical Unit NumberReservedMode (01h)200h300h400h500h600h7Transfer Length (MSB)8Transfer Length (LSB)900h
Each SAF-TE enclosure actually contains LEDs, audible alarms, etc., which are activated as a result of the status flags set/reset through the Write Device Slot Status command.
The fourth byte of the status information transferred by the Read Device Slot Status command indicates that a drive replacement action occurs (that is, a drive is removed and another drive is inserted), and whether the drive is prepared for operation. A drive insertion also causes bit seven in the first byte referred to as the “unconfigured” flag to be set. This serves as an indication that the drive was replaced so that a drive swap is not potentially ignored. The host adapter gets the indication that the “unconfigured” flag is set to “1” by the SEP device, and may trigger some action on the newly inserted drive, such as rebuilding the newly inserted drive with reconstructed data calculated based on what is recorded on other members of the redundant drive array (e.g RAID array) if the drive in that slot was so configured and went off-line, or making the new drive a “hot spare” for an array. In other instances when the host adapter gets the Check Condition status from the SEP it will not trigger any particular action, for example, no action is generally triggered when the SEP returns Check Condition status in response to a command. Note that all commands listed above (Inquiry, Test Unit Ready, Read Buffer, Write Buffer, Request Sense) are subject to the normal “command completion” timeout (typically on the order of about five to ten seconds) for failure to complete promptly; that is, to complete within a timeout period established by the host adapter.
The current main means for exchanging status between the host adapter and SAF-TE enclosures is regularly polling the SEP device by the host adapter at predetermined intervals, for example at two to ten second intervals, where typically the SEP device is polled at least once every 10 seconds, as described above.