This invention relates generally to apparatus and method for handling a non-responsive device in a computer system where the device non-responsiveness may be due to a powered-down status and not a device failure, and more particularly to such computer systems when the devices are RAID disk drives.
Conventionally, for a computer system having a host processor powered by one switchable power supply, and one or more peripheral devices powered by a second switchable power supply, the order in which each of the host and peripheral device is powered on may affect the start or boot up procedure. More particularly, if the host is powered on before the peripheral devices, such peripheral devices may either not show up in the configuration, or show up but be identified as non-responding or the like inactive state.
This situation is particularly an issue in host computer systems which serve as database or information servers, and which typically have a host computer and one or more racks or shelves of rotating disk drive storage devices for storing the information. Customarily, each of the host computer processor rack and disc drive racks are powered by separate switchable power supplies. Unfortunately, the order and timing of the power up and power-down of the several racks effects the start-up or boot routine at system initialization, and may cause an error condition on shut-down or power-off.
These conditions have been tolerated in the past by (I) indoctrinating personnel as to the proper power-up and power-down sequence for the host computer and attached devices, (ii) by providing a master power-on switch for all of the equipment, or (iii) by correcting corrupted or erroneous device or system configuration files after the problem has occurred. Unfortunately, neither of the first two options has been entirely successful so that corruption still occurs, and when such corruption occurs, correction typically requires the intervention of a skilled computer administrator.
The problem is particularly acute relative to RAID disc drives on a server being marked logically off-line, some times referred to as simply off-line or xe2x80x9cDead.xe2x80x9d
This invention provides structure and method for handling a non-responsive device in a computer system where the device non-responsiveness may be due to a powered-down status rather than a device failure, and more particularly to such computer systems when the devices are RAID disk drives. By scanning devices connected to the computer system over a bus, a count can be made of the devices that do not respond after being signaled during a time interval. If after all scans to all connected devices has been made, if the count number equals the number of devices in the configuration, it is likely that a power down situation has occurred. In this case, the affected devices are indicated as unavailable rather than offline. If the count number does not equal the number of devices in the configuration, it is likely that some devices have failed or are experiencing problems. In this case, the affected devices are indicated to be xe2x80x9cofflinexe2x80x9d rather than xe2x80x9cunavailable.xe2x80x9d In the event that the devices are determined to be unavailable, the method of the present invention may be repeated as necessary to detect the connected devices once a power up has been performed.