1. Field of the Invention
The invention relates generally to hardware controllers and more specifically to methods and structures for persistently storing errors during overwrite events for error recovery.
2. Discussion of Related Art
Hardware controllers establish connections between devices and maintain those connections to ensure that Input/Output (I/O) requests from one device to another are properly conveyed. An example of such occurs in storage area networks, where storage controllers process commands between initiators (e.g., host systems) and target devices (e.g., storage devices or expanders). The initiators are typically connected to the storage controller by a data transport medium (e.g., Fibre Channel transport medium) and the target devices are connected to the initiators by another data transport medium (e.g., a SCSI transport medium). Once connected, the different transport mediums and the various layers thereof (e.g., link layer, PHY layer, etc.) allow an initiator to communicate and exchange data with the target device.
This form of layered communications between initiators and targets has developed to provide reliable high-speed communications. However, errors can still occur and, in many instances, it is difficult if not impossible to assess the cause of an error or other issue because the flow of information in the storage controller can be particularly heavy at times. This is generally due to the fact that hardware controllers overwrite error information when another error is detected.
Because error information is not maintained in a hardware controller, the conditions under which the error occurred (e.g., hardware, software, and/or load conditions) generally need to be replicated. Then, the hardware controller is monitored to determine the cause of the error. This relies on the assumption that the error will even occur under the replicated conditions. And, replication of the conditions can be time consuming and costly. The replication process also generally results in down time for the hardware controller because it is removed from its typical operations. Hence, there is exists a need to simplify the recovery of errors in a hardware controller.