The present invention relates to network communications and more particularly to network communications to a cluster of data processing systems.
As the use of the Internet has increased, in general, so has the demand placed on servers on the Internet. One technique which has been used to address this increase in demand has been through the use of multiple servers which perform substantially the same function. Applications, such as Telnet or Internet Mail Access Protocol (IMAP)/Post Office Protocol 3 (POP3) mail serving, may need to connect a Transmission Control Protocol (TCP) client to a particular TCP server of a set of similar, but not identical, servers. However, the particular server instance typically cannot be selected until after information from the client has been received. For example, an Internet Service Provider (ISP) may have multiple e-mail servers to which users may connect to obtain their mail. However, when multiple servers which perform substantially the same function are present, selecting which server a user should be connected to may present difficulties.
In the e-mail example discussed above, one conventional approach has been to have users individually configure each client application to request a connection to a dedicated server. Such may be accomplished by providing each server with a unique name or Internet Protocol address and configuring the client to specify the name or address when making a connection. Such approaches may, however, present difficulties in maintaining balanced workload between the servers as users come and go. Furthermore, reconfiguring a large population of client applications may present administrative difficulties.
Another approach to routing clients to specific servers has been through an application unique protocol between the client and the server which performs application redirection. In application redirection, a client typically establishes a first connection to a first server which sends a redirect instruction to the client. Upon receiving the redirection instruction, the client disconnects from the initial server and establishes a second connection to the specified server. One difficulty with such an approach, however, is that the client and the server typically must implement the application-unique protocol to provide the redirection and, thus, the redirection is not transparent to the client.
Another approach is known as proxying, where the client establishes an initial connection to a proxy application and the proxy application forms a second connection with the proper server after obtaining enough information from the client to select a server. Such an approach may have the advantage that the selection and communication with the selected server by the proxy may be transparent to the client. However, both inbound and outbound communications must, typically, traverse a protocol stack twice to be routed by the proxy application. First, the communications traverse the protocol stack to the proxy application and again traverse the protocol stack when routed by the proxy application. Such traversals of the protocol stack may consume significant processing resources at the server executing the proxy application.
In addition, if the data between the client and the server is encrypted, it may not be possible for the proxy to decrypt the data and select the proper server. For example, for Secure Socket Layer/Transport Layer Security (SSL/TLS), the proxy typically must share the SSL/TLS keys with each server for which it proxies. For Internet Protocol Security (IPSec), the proxy typically either acts as the IPSec endpoint for both the client and server or must share the Security Association with the server. In all such cases, in order for the proxy to examine the protocol content, end-to-end security must generally be broken.
In additional approaches, the client establishes a connection to a proxy, which in turn establishes a connection to the ultimate server. Either at a low level in the stack or by instructing an external router, a TCP connection translation function is set up which causes the router or stack to perform modifications on all incoming and outgoing TCP segments. The modifications may include the server-side address (destination for incoming requests, source address for outgoing replies) in the IP header, sequence numbers in the TCP header, window sizes, and the like. Such an approach may not require traversal of the entire TCP stack, but may result in every TCP segment requiring modification, and if IP addresses flow in the application data, the connection translation function may not translate such addresses unless specifically programmed to scan all the application data. This approach also generally requires all flows for the connection, both inbound and outbound, to traverse a single intermediate node, making it a single point of failure (like the proxy).
Furthermore, the Locality-Aware Request Distribution system developed at Rice University is described as providing content-based request distribution which may provide the ability to employ back-end nodes that are specialized for certain types of requests. A xe2x80x9cTCP handoff protocolxe2x80x9d is described in which incoming requests are xe2x80x9chanded off to a back-end in a manner transparent to a client, after the front-end has inspected the content of the request.xe2x80x9d See Pai et al., xe2x80x9cLocality-Aware Request Distribution in Cluster-based Network Serversxe2x80x9d, Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, Calif., October, 1998. See also, Aron et al., xe2x80x9cEfficient Support for P-HTTP in Cluster-Based Web Serversxe2x80x9d, Proceedings of the USENIX 1999 Annual Technical Conference, Monterey, Calif., June, 1999. However, this approach is generally directed to a stateless environment with well defined requests, each of which may be distributed to different nodes.
However, often a communication connection such as an Internet connection may be used for transaction processing where the transaction may involve more than one request/response pair. As described for the Locality-Aware Request Distribution system above, all of the transaction requests would be individually routed on a request by request basis even where they were all routed to the same server. In addition, a server may have a state which needs to be transferred along with a connection to support transparent (to the user) handoff which state based transfers are not provided for by the described Locality-Aware Request Distribution system described above.
Other approaches to moving a client connection or session from one server to another include Virtual Telecommunications Access Method (VTAM) multi-node persistent session support (MNPS). VTAM multi-node persistent session support allows for recovering a System Network Architecture (SNA) session state on another VTAM when an application fails and is restarted. However, typically, a client must re-authenticate to the applications or other system using multi-node persistent sessions. Furthermore, such a movement from a first VTAM to a second VTAM typically only occurs after a failure.
VTAM also supports CLSDEST PASS, which causes one SNA session with the client to be terminated and another initiated without disrupting the application using the sessions. Such a movement from one session to another, however, typically requires client involvement.
Embodiments of the present invention include methods, systems and computer program products for transferring a Transmission Control Protocol (TCP) connection with a client device between data processing systems in a cluster of data processing systems. A connection is established between the client device and a routing node coupled to the cluster of data processing systems utilizing a communication protocol stack at the routing node, the protocol stack having an associated state. An operating system kernel of the routing node obtains application level information from an initial transaction received from the client over the connection. The transaction includes at least one request. The operating system kernel also selects a target application at a first data processing system of the cluster of data processing systems for transfer of the connection based on the obtained information and transfers the connection to a target communication protocol stack on the first data processing system associated with the selected target application including providing the associated state information of the communication protocol stack of the routing node.
The target communication protocol stack accepts the connection from the routing node based on the provided associated state information of the communication protocol stack of the routing node so as to, transparently to the client, establish communications between the client and the target application. A notification of completion of the transaction is received from the target application and the connection is made available to a routing device for selection of a next target application to receive the connection responsive to receipt of the notification of completion of the transaction. The routing device may be the routing node or the first data processing system.
In other embodiments of the present invention, receiving a notification of completion of the transaction from the target application includes receiving the notification from the target application at the target communication protocol stack over the connection as data which is detectable by the target communication protocol stack as being directed to the target communication protocol stack rather than the client. The data may be received as ancillary data of a sendmsg socket call. The data also may include application state information associated with the connection from the target application. In such embodiments, the application state information may be provided to the routing device for use in transferring the connection to the next target application. A communication protocol stack associated with the next target application may receive the application state information and the notification as ancillary data of a recvmsg socket call. The associated state information of the target communication protocol stack may be provided to the routing device for use in transferring the connection to the next target application. In various embodiments, the application state information may be a null set.
In further embodiments of the present invention, an operating system kernel of the routing device obtains application level information from a next transaction received from the client over the connection, the next transaction including at least one request. A next target application at a data processing system of the cluster of data processing systems is selected for transfer of the connection based on the obtained application level information from a next transaction. The connection is transferred to the communication protocol stack associated with the next target application including the associated state information of the target communication protocol stack and the application state information associated with the connection to the communication protocol stack associated with the next target application. The selection of a target application may be carried out by a policy-based engine of the operating system kernel of the routing node.
Sufficient application level information may be obtained to identify the initial transaction. The application level information is obtained in various embodiments by executing application-specific exits within the operating kernel of the routing node to examine data associated with the transaction to identify the transaction to the operating kernel of the routing node. The communication protocol stack of the routing node may be made available after the connection is transferred. In further embodiments of the present invention, the connection is an encrypted connection and the provided associated state information includes encryption information.
In other embodiments of the present invention, methods, systems and computer program products are provided for transferring a Transmission Control Protocol (TCP) connection with a client device between data processing systems in a cluster of data processing systems. A connection is established between the client device and a first application at a first data processing system of the cluster of data processing systems utilizing a first communication protocol stack associated with the first application, the first communication protocol stack having an associated state. An operating system kernel of the first data processing system obtains application level information from a transaction received from the client over the connection, the transaction including at least one request. In addition, application state information associated with the connection is obtained from the first application.
A second application at a second data processing system of the cluster of data processing systems is selected for transfer of the connection based on the obtained information and the connection is transferred to a second communication protocol stack on the second data processing system associated with the selected second application. The transfer includes providing to the second data processing system the associated state information of the first communication protocol stack and the obtained application state information associated with the connection from the first application.
While the invention has been described above primarily with respect to the method aspects of the invention, both systems and/or computer program products are also provided.