1. Field of the Invention
The present invention relates to diagnostic programs and techniques within a computer system, and more particularly to diagnostic techniques in which execution of the diagnostics is triggered during low activity of the device.
2. Description of the Related Art
Present-day computer systems typically use a distributed software model for device control, in which device drivers, which may be loaded into system memory or alternatively located in device memory, provide control of the various devices within the system, e.g., peripheral devices and system hardware components. Further, with hierarchical interconnects, large numbers of device drivers are typically loaded, as each level of the hierarchy will typically have at least one device driver, and in some cases, each device will have an associated device driver image.
In order to perform adequate system diagnostics, at least the system hardware devices, and desirably the peripheral devices, must be tested. However, in order to perform the diagnostics, activity on the devices must typically cease. That is, the ordinary operation of the device must be halted and diagnostic operations commenced. Further, in some device driver configurations having a diagnostic device driver separate from the driver that provides ordinary operation control, the ordinary device driver must be unloaded, or at least placed in a state that makes it possible for the diagnostic driver to access the device, and the device state must generally be preserved through the diagnostic process. Otherwise, diagnostics could only be performed at system startup and shutdown.
However, performing diagnostics interrupts operation of a system and its devices, and the state of the devices can be very large during times of high activity, requiring significant storage and transfer time. The time period during which ordinary operation is disrupted is also not trivial. A significant wait time may be experienced during loading and unloading of drivers and diagnostic applications and some diagnostics, such as exhaustive memory tests on large peripheral device buffers may require long execution times.
Further, it is desirable to perform diagnostics in parallel, as parallel operation provides faster results and should minimize system impact as far as the total time period that the system, or portions thereof, is undergoing diagnostic evaluation. However, since the devices must typically cease their ordinary functions during the diagnostic period, parallel diagnostic operation is typically not performed due to the larger impact, in general, of the device downtime required to perform the diagnostics and also due to the unpredictability of system demands that will raise activity levels across multiple devices. Therefore, diagnostics are typically performed serially, and typically under manual control, so that a user controlling the diagnostics can determine whether or not the diagnostics can be performed in view of system traffic, and can ensure that system resources will be adequate to service requirements during the diagnostics. For example, in a server array, diagnostics may be run serially on the network adapters in one server, and re-routed traffic will only increase by the traffic associated with one adapter. Conversely, if all of the network adapters one a server are run in parallel, the re-routed traffic could reach the maximum traffic allocated to the server.
Therefore, it would be desirable to minimize the impact of performing device diagnostics on actual device operation, as well as the impact of the device activity on performing the diagnostics. It would further be desirable to provide a diagnostic scheme in which device diagnostics can be performed in parallel within a system without severely impacting system performance.