Various forms of network-based storage systems exist today. These forms include network attached storage (NAS), storage area networks (SANs), and others. Network storage systems are commonly used for a variety of purposes, such as providing multiple users with access to shared data, backing up critical data (e.g., by data mirroring), and the like.
A network-based storage system typically includes at least one storage server, which is a processing system configured to store and retrieve data on behalf of one or more client processing systems (“clients”). A storage server may be a file server, which is sometimes called a “filer”. A filer operates on behalf of one or more clients to store and manage shared files. The files may be stored in a storage subsystem that includes one or more arrays of mass storage devices, such as magnetic or optical disks or tapes, by using RAID (Redundant Array of Inexpensive Disks). Hence, the mass storage devices in each array may be organized into one or more separate RAID groups. A storage server provides clients with file-level access. Some storage servers may additionally provide block-level access.
Current filers are generally packaged in either of two main forms: 1) an all-in-one custom-designed system that is essentially a standard computer with built-in disk drives, all in a single chassis (“enclosure”), or 2) a modular system in which one or more sets of disk drives, each in a separate chassis, are connected to an external filer in another chassis. A modular system can be built up by adding multiple chassis in a rack, and then cabling the chassis together. The disk drive enclosures in a module system are often called “shelves” or “storage shelves.”
To improve the reliability of storage shelves, it is generally necessary to test various fault conditions in the shelf hardware. The test may be conducted at design validation time and/or after shipment of the final product. The fault conditions may be caused by a failure in the microprocessors, shelf electronics, or communication links in a storage shelf. When a fault occurs in a storage shelf, a report is sent to the filer for analysis and for invoking corrective measures. Conventionally, a fault condition is tested by running hundreds of thousands of test patterns, in the hope that some of the test patterns will trigger a fault condition. There is no guarantee that any of the test patterns will cause a specific fault to occur. Thus, the conventional technique is time-consuming and cannot fully validate specific fault conditions.