1. Field of the Invention
The present invention relates to storage systems, and more specifically, to an automated test system for hot swappable field replaceable units (FRUs) in the storage system to test hot swap failure scenarios.
2. Description of the Related Art
Storage Systems such as the IBM DS8000 and XIV provide configurable systems for storing computer data. An exemplary storage system includes a rack that provides the physical frame to receive power, processing, storage, networking and other enclosures and one or more backplanes each having one or more connectors. The power enclosure includes one or more power supplies that draw power from the wall and provide the various power supplies required by different components of the storage system and the backplane. The storage enclosures each include a number of field replaceable units (FRUs) with mating connectors that plug into respective connectors on one or more backplanes. Each FRU includes a latch mechanism (e.g. a catch and lever) for physically locking and unlocking the FRU so that it may be replaced in the field. The processing enclosures each include one or more processing boards, which each include one or more central processing units (CPUs). The CPUs are in electrical communication with the backplane via cables or connectors. All power, data and control signals to and from the FRU's storage device hard disk drive (HDD), solid state drive (SDD) such as a flash memory card, and magnetic tape drive) pass through its mating connector. The network enclosure includes a networking switch that facilitates communication between the CPUs and the backplane(s). In some systems the function of the networking switch may be built into the processing boards. The storage system also has one or more network ports (e.g. Ethernet) on the network switch or other components to connect the storage system to an external network.
In the context of storage systems, the term “hot swapping” is used to describe an event of either removing or inserting a FRU from the storage system while the system is powered and operational. Hot swapping is commonly used to change the configuration of or repair a working storage system without interrupting its operation. In most storage networks it is simply not feasible to shutdown the storage system taking it out of operation to remove or replace a FRU. Furthermore, “hot swap” events typically occur without giving any notice to or preparing the storage system for the hot swap event. FRUs and the storage system are designed to support hot swapping.
Storage systems are designed to recognize the occurrence of a hot swap event and to execute the necessary steps in response to the hot swap event to, for example, reconfigure the system in the absence of a particular FRU or to recognize and incorporate a new FRU that expands storage capacity. The storage system's CPUs will generate system message traffic to recognize the hot swap event and to take the necessary steps in response to that event.
Hot swapping can produce a variety of failure scenarios it powered and operational storage systems. Failure scenarios that have occurred include a total system crash resulting in loss of service events, data loss, loss of access to parts of the storage system, backplane failure and failure of storage devices during FRU replacement.
Storage system vendors and storage device manufacturers appreciate the need to rigorously test hot swapping of the FRUs in the storage systems under many different scenarios to test both the design of a storage system and the operation of as particular storage system prior to customer delivery. The vendors and manufacturers want to ensure that the storage system responds to various hot swap scenarios properly and as designed and that and failure scenarios are limited. Testing essentially involves connecting and disconnecting FRUs, and particularly the storage device, from the backplane connector and monitoring the message traffic generated by the storage system to collect diagnostic test data.
Vendors currently use a variety of different techniques to connect and disconnect the FRUs. One approach is to have a person physically remove or insert the FRU. This approach allows for considerable flexibility and most closely emulates the conditions of a customer hot swap event. However this approach is labor intensive, slow, costly and limits the extent to which various failure scenarios can be practically tested. Another approach is to have a robot or robotic arm physically remove or insert the FRU. This approach is similar to the manual approach and can be automated for more extensive test procedures. However, robotic systems have a high initial cost and high cost to maintain. Another approach is to install an electronic interface card inside the FRU between the storage device and the mating connector. This approach can be easily integrated into an automated test system and allows for extensive and diverse testing. However, this approach only connects and disconnects the storage devices electrically not physically. Furthermore, the presence of the interface card between the storage device and mating connector may affect this or other tests. The card must be removed before the storage system is delivered so that the tested system is not the same as the as-delivered system. Another approach is to simulate the hot swap events in software causing the storage system to think a hot swap event has occurred and to respond accordingly. This approach be easily integrated into an automated test systems and allows for extensive and diverse testing. However, this approach does not simulate the physical and electrical stresses inherent in physically removing or inserting a FRU.