1. Technical Field
The present invention relates generally to an improved data processing system, and in particular to a method and apparatus for pausing a send queue in a data processing system without causing sympathy errors.
2. Description of Related Art
In a System Area Network (SAN), the hardware provides a message passing mechanism which can be used for Input/Output devices (I/O) and interprocess communications between general computing nodes (IPC). Consumers access SAN message passing hardware by posting send/receive messages to send/receive work queues on a SAN channel adapter (CA). The send/receive work queues (WQ) are assigned to a consumer as a queue pair (QP). The messages can be sent over five different transport types: Reliable Connected (RC), Reliable datagram (RD), Unreliable Connected (UC), Unreliable Datagram (UD), and Raw Datagram (RawD). Consumers retrieve the results of these messages from a completion queue (CQ) through SAN send and receive work completions (WC). The source channel adapter takes care of segmenting outbound messages and sending them to the destination. The destination channel adapter takes of reassembling inbound messages and placing them in the memory space designated by the destination""s consumer.
Two channel adapter types are present, a host channel adapter (HCA) and a target channel adapter (TCA). The host channel adapter is used by general purpose computing nodes to access the SAN fabric. Consumers use SAN verbs to access host channel adapter functions. The software that interprets verbs and directly accesses the channel adapter is known as the channel interface (CI).
In a SAN fabric such as that described above, when a work request is sent from a send work queue of a first consumer to a receive work queue of a second consumer, error conditions may occur. When a reliable datagram error condition occurs, both the send work queue and receive work queue of the first consumer is placed in an error state and an indication of an error having occurred is sent to the receive work queue of the second consumer. In response to receiving the error indication from the first consumer, the receive queue of the second consumer is also placed in an error state.
The error-state prevents other consumers from sending messages to receive work queues placed in the error state and prevents the send work queues placed in the error state from sending messages. Thus, it can be seen that an error occurring in one consumer may be propagated to a number of other consumers, and so on. This is known as sympathy error.
This cascading effect may become severe enough to affect all work queues in the SAN fabric. Thus, it would be beneficial to have an apparatus and method for preventing sympathy error in a SAN fabric system.
The present invention provides an apparatus and method for pausing a send queue while preventing sympathy error from propagating through a SAN fabric system. The apparatus and method of the present invention place a send work queue in an error state, i.e. pauses the send work queue, when an error occurs in the send work queue but does not place any other work queues in an error state. In this way, the send queue experiencing the error is not able to send any further messages until error recovery is performed. However, other work queues continue to be able to send and/or receive messages. Once error recovery is performed, the send work queue that was placed in the error state is returned to a working state and is able to continue to send messages. In addition, the send queue that was in the error state will send the messages that it attempted to send at the time of the error. The messages sent will continue from a last known point at which the send work queue was operating properly. Other features and advantages of the present invention will be described in, or will become apparent to those of ordinary skill in the art in view of, the following description of the preferred embodiments.