A large-scale computing system generally includes a storage controller, such as the IBM® Enterprise Storage Server®, which processes input/output (I/O) commands from one or more host devices, such as an IBM S/390®, to write data to or read data from one or more storage devices, such as hard disk arrays, storage libraries or the like. Such controllers include error handling routines to process errors in the various I/O adapters through which external devices, such as hosts, servers and storage devices are attached to the storage controller. Although many errors may be “cleared” by resetting error registers in various components within the controller, there are many other types of errors which require a hardware reset in order to recover from the error.
As will be appreciated, a hardware reset is time consuming and very disruptive to host operations. In a typical prior art recovery process, directed by an error handler, microprocessor code must be reloaded and built-in self-tests and power-on self-tests must be run before registers may be initialized. Moreover, global structures which are shared and exchanged with other processors must be updated.
Consequently, a need exists for a less disruptive error recovery process in a device such as a storage controller.