1. Field of the Invention
The invention relates generally to testing in storage system and more specifically relates to improvements in injection of CRC errors in communications with storage devices in a storage system for testing error recovery in the storage system.
2. Discussion of Related Art
Storage systems typically comprise a storage controller coupled to one or more storage devices. In large scale storage systems, multiple controllers and a large number of storage devices (e.g., disk drives) are typically housed in an enclosure. One or more host systems are coupled to one or more of the storage controller through storage networking protocols and media. The host systems apply I/O requests to the storage system through the storage controllers which, in turn, apply appropriate I/O operations to one or more of the storage devices within the storage system to write data or to retrieve previously written data.
The storage controllers are adapted to detect errors in the communications with the storage devices and to perform various types of error recovery processing to attempt to recover from the various types of error conditions. One common type of error condition is a Cyclic Redundancy Check (CRC) error detected in exchanges between the storage controller(s) and one or more storage devices. A CRC error represents detection of a bit (or multi-bit) error in the communication link between the storage controller(s) and a storage device. A CRC code is computed and transmitted with data in such an exchange. The receiving device computes its own CRC code based on the data as received. The receiving device compares its computed CRC code with the received CRC code to detect an error in transmission/reception of the data. CRC errors may arise in operational storage system due to electromagnetic noise or other environmental aspects of the operating storage system.
Design engineers and/or field support engineers dealing with storage systems often need to test the ability of elements of the storage system to properly recover from CRC errors. Design engineers may wish to test the design of their storage controller or storage device to verify proper detection and recovery from CRC errors. In like manner, field support engineers may wish to test CRC error recovery processing of a storage controller or a storage device to isolate a fault detected in a field installation of a storage system.
A common prior technique for such testing involves inserting a “jammer” device in the communication link between the storage controller(s) and a storage device. The jammer device is physically and electronically inserted between the two components and controllably injects bit errors in the exchanges between the controller(s) and the storage device. These injected bit errors will cause a CRC error to arise in the exchanges between the controller(s) and the storage device and thus enable the engineer to evaluate or debug recovery processing from CRC errors. Jammer devices for injecting CRC errors are well known and widely available for insertion into any of several widely used communication media and protocols.
Ad hoc insertion of a jammer device to test CRC error recovery gives rise to a number of problems. Insertion of a jammer in a storage system may cause physical/mechanical problems in that the jammer may not physically fit in the nominal mounting structure of the storage device to which it is to be attached. For example, in the context of larger storage systems, storage devices (e.g., disk drives) are typically mounted into a tray or carrier so that they may be readily inserted and removed from the storage system enclosures (e.g., for “hot swap” functionality). Further, electronic insertion of the jammer into the nominal connection between the storage controller(s) and the storage device alters the electronic characteristics of the coupling such that signal timings may change and other unintended errors may be introduced thereby. Still further, to test each storage device in a large storage system, a jammer would have to be inserted for each drive to be tested. Thus one jammer would have to be inserted, removed, and re-inserted numerous times to test each of a large number of drives or a large number of jammer devices would have to be provided at significant cost. Numerous other problems arise in use of such jammer devices in that they have little flexibility to alter the style of testing or to adapt to the specific testing needs of a particular application.
Thus it is an ongoing challenge to flexibly and effectively test CRC error recovery in storage systems.