1. Field of the Invention
This invention generally relates to multiplexed client messaging conversations, and more specifically, the invention relates to a method and system for quiescing multiplexed client messaging conversations.
2. Background Art
In many state of the art computer systems, a single communication channel can handle communications in both directions between a client and a server, and, in fact, one channel can handle communications in both directions between plural or multiple clients and a server. Communicating two or more conversations over a single channel socket, referred to as multiplexing, allows many applications connections to proceed while limiting the resources they use and the connection time they may require. Typically, a number of features apply to the entire channel/socket and are established at channel/socket startup. Subsequent application connections with these features can connect across the same channel/socket without the need to re-establish the features for themselves. A typical example is the TLS/SSL security features of a channel/socket, which are often expensive to establish, but can be shared without re-establishing them by subsequent connections with the same TLS/SSL profile.
The most simple and efficient multiplexing scenario would be: channel/socket-wide set-up and negotiation is conducted at start-up; all the conversations start, run and complete; and the channel/socket is taken down.
However, certain channel-wide events have to be processed while the channel is active with many conversations. A particularly important example of this is TLS/SSL secret key reset; the private key established at channel start-up is recalculated periodically while the channel runs to prevent the key from becoming compromised. In order to achieve this a potentially large number of multiplexed conversations must be temporarily stopped before the secret key reset can flow, and then these conversations must all restart.
Existing solutions for processing these channel-wide events include (1) to wait for existing conversations to complete, and (2) single global “stop send” mutexes. These two solutions are discussed below.
Wait for Existing Conversations to Complete
One simple solution allowing mid-run channel renegotiation would be: (a) detect that renegotiation is needed; (b) block attempts to start new conversations; (c) wait for the existing conversations to complete; (d) renegotiate; and (e) allow new conversations to start again. For some applications, for instance interactive ones with short running connections, this could be a workable strategy. But with a messaging infrastructure, the conversations can be very long-running and the channel renegotiation would be unacceptably delayed if it could only happen when all the conversations had ended.
Single Global “Stop Send” Mutexes
A more flexible solution allows the channel renegotiation to occur while individual conversations are still active. A global send mutex is maintained on each side of the channel to serialize sends to the communications protocol from different threads. On the side of the channel, which first detects the need for renegotiation, this mutex is set for the duration of the renegotiation. A flow is made to the other end of the channel to indicate that we have stopped sending and that it should too. That other end of the channel also sets its send mutex for the duration of the renegotiation. The renegotiation, which is normally initiated by the client end of the channel, can then take place (at a lower layer of code which does not need send mutexes as it will operate in half-duplex mode). Once the client is satisfied that this has completed, it enables the higher-layer sends again, and sends an indication to the server that it should do the same.
In order to implement a multiplexed system, a thread is needed for receiving data at both ends of the channel at all times. This thread will sometimes be responsible for sending data too, if a relatively simple response is required, and so is controlled by the sending mutex. But it is also required to receive the data for the channel renegotiation. As neither receive thread can be allowed to send data while the channel is quiescing, the threads are subject to the global “send data” mutex. But if either receive thread gets suspended on this mutex, the channel renegotiation will fail, as its flows will not be received, and the channel will hang.
In spite of this, the global send mutex solution can be redesigned to work as described as long as certain, not unreasonable, limitations are imposed on the flows which can be received.
In general, the application connections on the client will each have their own threads, which will send application data. The receive thread on the client passes the responses to the application requests back to the application threads as soon as they are received and is not responsible for sending anything back in response to them. On the other hand, the receive thread on the server will often process short requests from the client applications and will itself send the responses back. So, quiesce flows are configured so that the active quiescing is always initiated from the client; if the server detects the need for a renegotiation it just tells the client to go into active quiescing mode; the server does not stop its sending until subsequently when the client has told it to.
The client sets its global send mutex as soon as it has become aware, either from its own processing or via a flow from the server, that channel renegotiation is required; the application threads suspend as they try to send, but the receive thread never does, as it never sends anything. The client now tells the server to quiesce itself. Assuming the client's flows are received in order at the server, when the server gets the request to quiesce, it can be sure that no other flows will be received on the receive thread (which might have triggered a send from the receive thread and hence a deadlock) and so the server can safely set its global send mutex. The renegotiation and restarting of sends can then occur straightforwardly.
While such a system is workable, it relies on the restriction that the client receive thread can never send data. In a more sophisticated system, this limitation can be undesirable. For instance, bidirectional heartbeats (i.e. heartbeats initiated independently by both the server and the client) can be used to improve reliability and performance (see, for example, Cisco WAN Manager Peer-to-Peer Communication: http://www.cisco.com/univercd/cc/td/doc/product/wanbu/svplus/15/ug/app3.htm). A server-initiated heartbeat flow is not connected with a particular application thread and should be responded to immediately from the client receive thread. Once this, or similar, flows are allowed, using the single global “stop send” mutexes approach leads to the possibility of a client receive thread deadlock in the window between when the client has stopped its sends and indicated this to the server, and when the server receives this flow.