The communications industry is rapidly changing to adjust to emerging technologies and ever increasing customer demand. This customer demand for new applications and increased performance of existing applications is driving communications network and system providers to employ networks and systems having greater speed and capacity (e.g., greater bandwidth). In trying to achieve these goals, a common approach taken by many communications providers is to use packet switching technology.
It is increasingly common with an Internet service for a client to issue multiple service requests over a transport (e.g., Transmission Control Protocol a.k.a. “TCP”) connection. For example, with HTTP 1.1 persistent connections, multiple HTTP requests can be made over a single (persistent) connection. Similarly, with NFS over TCP, multiple read, write, etc. operations are typically performed over the same connection, given that the connection is established at the time the server is “mounted” and remains until it has been unmounted or a significant failure occurs. A similar behavior arises with CIFS connections.
It is also common to implement scalable services as a cluster of physical servers connected to the Internet or an intranet through a load-balancing switch appearing on the Internet as a single virtual host. From the client's perspective, its transport level connection is with this virtual host. It is oblivious to the multiple physical servers.
In this configuration, if a client's request is simply directed to a particular physical server based on load or randomly, each physical server needs to either have a local copy of all the content the client can request or needs to communicate with the “home” server for the content. With a non-trivial cluster of k servers, fully replicating the content is impractical in general and raises issues and cost of maintaining coherency, Also, if each server communicates with a home server for the content, statistically most of the client requests would involve this extra server-to-server communication, limiting the benefits of the cluster architecture. With a large cluster, the benefits of a load-sensitive approach may be limited because real hotspots are unlikely with many physical servers.
Another approach is to have a set of backend servers that are shared among all the “front-end” servers. However, this introduces an extra level of server machines and their associated cost, and also leads to coherency issues if the front-end machines cache data, and limited performance if they do not.
An alternative configuration is for the load-balancing switch to scan each client request and redirect the client connection transparently to the appropriate server holding this content. For example, a content switch typically uses network address translation (NAT) to translate the client packets on its apparent connection to the service virtual host to a selected physical server. Changing this mapping each time, a new client request asks for content that is homed on a different physical server. However, this implementation relies on a client receiving the response from one request before issuing the next request, and allowing the switch to know when to redirect the return flow from the server to the client to the next physical server. If the client pipelines its requests, the switch must somehow detect the end of one response and splice in the return flow from the next server, potentially well after several clients requests have been forwarded on to different servers for response.
Needed are new systems and methods to allow pipelined client requests while switching the requests to the server most appropriate to handle each request.