Computing systems often include a mass storage system for storing data. One popular type of mass storage system is a xe2x80x9cRAIDxe2x80x9d (redundant arrays of inexpensive disks) storage system. A detailed discussion of RAID systems is found in a book entitled, The RAID-Book: A Source Book for RAID Technology, published Jun. 9, 1993, by the RAID Advisory Board, Lino Lakes, Minn.
A typical RAID storage system includes a controller and a disk array coupled together via a communication link. The disk array includes multiple magnetic storage disks for storing data.
In operation, the controller of a RAID storage system operates to receive commands (e.g., I/O commands, configuration commands and status commands) from an external host computer. In response to an I/O command, for example, the controller reads and writes data to the disks in the disk array and coordinates the data transfer between the disk array and the host computer. Depending upon the RAID implementation level, the controller in a RAID system also generates and writes redundant data to the disk array. The redundant information enables regeneration of the user data in the event that one or more disks fail or are removed and the data becomes lost.
A RAID level 1 storage system, for example, includes one or more disks (data disks) for storing data and an equal number of additional xe2x80x9cmirrorxe2x80x9d disks for storing the redundant data. The redundant data in this case is simply a copy of the data stored in the data disks. If data. stored in one or more of the data disks becomes lost, the mirror disks can then be used to reconstruct the lost data. Other RAID levels store redundant data for data distributed across multiple disks. If data on one disk becomes lost, the data in the other disks are used to reconstruct the lost data.
Typically, the developer of a RAID storage system will wish to thoroughly test the device before releasing the device for public use. Unfortunately, the testing of a RAID storage system can be very time consuming. In order to automate the testing, automated testing systems have been developed.
Typically a RAID storage device is a multitasking computing device. That is, a RAID storage device is able to process commands and perform a number of functions concurrently. A typical RAID test system is often used to test the ability of the RAID storage device to operate in a multitasking mode. For this reason, a RAID test system will execute multiple test programs concurrently during a test. Each test program or process generates test commands (e.g., I/O commands, configuration commands and status commands) and transmits these commands to the RAID storage device being tested. As the storage device responds to these commands, each test program operates to detect errors and will typically terminate when an error is detected. The test system will also typically include a recording device (such as a trace buffer, logic analyzer, etc.) in order to record the state of the RAID storage device when an error has occurred.
Such automated test systems can be very useful to the developer and can significantly reduce the amount of time spent testing the device as opposed to using manual techniques. One problem with such systems can be encountered, however, when one test process detects an error while other processes do not.
For example, the developer may wish to test the ability of the RAID storage device to perform function A concurrently with function B. For this purpose, the developer writes a first test program (program A) to cause the storage device to perform function A and a second test program (program B) to cause the storage device to perform function B. Each test program expects certain responses from the storage device during the test that indicate the associated function is being properly performed. If, during the test, the test program does not receive the expected response, the execution of the test program terminates.
During the test, each of the two test programs are executed concurrently so that function A and function B are performed by the RAID storage device concurrently. Assume, for example, program A detects that the RAID device failed to properly perform function A. As a result, program A terminates. Program B however, does not detect this condition as the RAID storage device continues to perform function B properly. Consequently, the state of the RAID device continues to change. Moreover, the information related to the state of the RAID device when the error (i.e., when function A failed) occurred can be lost as test data is overwritten. This can make reconciling test results difficult at best and sometimes impossible.
One solution to this problem is to is to develop a master process to coordinate the execution of the test processes. The master process communicates with the test processes using an interprocess communication path. If one test process experiences an error, the master process detects this condition and then operates to terminate the other test processes in order to maintain the state of the storage device under test. Unfortunately, this solution can add a significant amount of complexity to the test system and increase the amount of time to develop the test system itself.
Accordingly, what is needed is a simple way to synchronize test processes which are concurrently executed by a test system to test a multitasking computing device, such as a RAID storage device.
The present invention is directed to an apparatus for synchronizing test processes which are concurrently executed by a test system to test a multitasking computing device, such as a RAID storage device. Importantly, the apparatus is simple, easy to implement and can significantly reduce the complexity of the test system.
A test system having features of the invention is used for testing a multitasking computing device, such as a RAID storage device. The multitasking computing device is responsive to a predetermined command, which may be referred to herein as the xe2x80x9cinject fault commandxe2x80x9d, by entering a pre-determined mode of operation wherein the device maintains its present state and all command processing is terminated. As a result, the device becomes non-responsive to further commands.
The test system includes one or more memory devices for storing a plurality of test programs. Each of the test programs being operable, when executed, to (a) test an associated function of the device and (b) transmit the inject fault command to the device if the test fails. In addition, each program is further operable, when executed, to (c) discontinue testing the device if the device becomes non-responsive. The test system further includes a processor for executing the test programs concurrently.
During the execution of each of the test programs the test programs test the associated function of the device by transmitting test commands to the device and receiving responses from the device. In one preferred embodiment, the test system is configured to test a RAID storage device.
A RAID storage device embodying the invention includes an I/O port configured to receive commands from an external computer; a disk array having a plurality of data storage disks; and a controller. The controller includes a processor operable to process commands received at the I/O port. In addition, the controller operates to respond to a pre-determined command received at the I/O port by placing the processor in a tight loop. While the processor operates in this mode, the RAID storage device maintains its present state and all operative processing is discontinued.
Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.