This invention relates to the testing of device drivers, and in particular to testing the hardening of device drivers.
Traditionally devices drivers have been written with their emphasis on good performance and correct operation in the absence of faults. Looking in driver writing guides and training material there is little (if any) mention made of how to deal with faulty hardware devices. A faulty device can and often will cause the system to crash. As a result, a $50 PCI card can crash a $500,000 server.
A standard approach to improving system availability in the face of I/O faults is to have the system crash, then reboot and configure the faulty device out of the system. This is not always, however, an acceptable approach.
A much better approach is to modify drivers to survive I/O faults and reconfigure the device without a reboot. A device driver which has been designed to be resilient against such failures is known as a “hardened” driver. A hardened device driver is defined as being a device driver with the minimal potential of compromising the integrity of the system of which it is part.
Driver hardening techniques have the potential to contribute greatly to system Reliability, Availability & Serviceability (RAS). Hardened device drivers reduce the potential for defective devices to cause a totally disruptive system loss. The failed component can then be replaced as part of scheduled maintenance. To so harden a device driver a designer has to consider the many implications that the failing hardware may have on their code.
The philosophy behind successful driver hardening is one of total paranoia. A defective device can be thought of as containing a ‘malicious saboteur’ whose ambition is to completely disrupt the server system of which it is part. It may attempt this in a range of devious and inventive ways. It may refuse to respond to accesses, so causing bus time-out exceptions. It may seek to totally absorb a processor in servicing hoax interrupts. It may attempt to dupe the system kernel into undertaking suicidal action. It may simply go quiet and withhold vital services. It may corrupt the data which it delivers.
The hardened driver must have the ability to rapidly identify & contain a fault. Timely detection is necessary if the implications of a device failure are to be controlled. Preservation of system integrity requires that faults are detected before they uncontrollably alter the system state. Consequently steps must be taken to test for faults whenever data returned from the device is going to be ‘used’ by the system.
As with any other aspect of a computer system, it is desirable to be able to test a device driver, and in particular the hardening of a device driver, to deal with faults the device driver will need to contend with in a computer system. The hardening of the device drivers can be tested to some extent by physically modifying the system and device hardware to allow the introduction of the faults it is desired to test. However, this is an expensive and time consuming task, and may in the end only give limited ability to test possible faults.
Accordingly, an aim of the present invention is to provide a more effective testing that device drivers have been correctly and thoroughly hardened.