Ever since the introduction of the microprocessor, computer systems have been getting faster and faster. In approximate accordance with Moore's law (based on Intel® Corporation co-founder Gordon Moore's 1965 publication predicting the number of transistors on integrated circuits to double every two years), the speed increase has shot upward at a fairly even rate for nearly three decades. At the same time, the size of both memory and non-volatile storage has also steadily increased, such that many of today's personal computers are more powerful than supercomputers from just 10-15 years ago. In addition, the speed of network communications has likewise seen astronomical increases.
Increases in processor speeds, memory, storage, and network bandwidth technologies have resulted in the build-out and deployment of networks with ever increasing capacities. More recently, the introduction of cloud-based services, such as those provided by Amazon (e.g., Amazon Elastic Compute Cloud (EC2) and Simple Storage Service (S3)) and Microsoft (e.g., Azure and Office 365) has resulted in additional network build-out for public network infrastructure, and addition to the deployment of massive data centers to support these services which employ private network infrastructure. Additionally, the new generation (e.g., 4G) of mobile network data services is expected to significantly impact the utilization of land-line networks in the near future. The result of these and other considerations is that the utilization of computer networks is expected to continue to grow at a high rate for the foreseeable future.
A common architecture employed for cloud services and other on-line sites, such as ecommerce sites, social networking sites, content hosting sites, and news sites, uses a multi-tier architecture having a Web server front-end coupled to one or more tiers of servers, such as application servers and database or storage servers. The Web server tier itself may employ a load distribution scheme employing multiple levels in a fan-out model fashion. Load-spreading is also commonly deployed between Web server and application server tiers.
Technically, a Web server is more accurately called an HTTP (HyperText Transport Protocol) server. HTTP employs a request-response protocol using a client-server model. HTTP is a stateless protocol that was originally implemented such that a connection was closed after a single response pair. In HTTP 1.1 a keep-alive mechanisms was added, where a connection may be used for multiple requests; these connections are termed “persistent” connections. In addition, HTTP 1.1 introduced chunked transfer encoding to support data streaming using persistent connections.
Each server has its own network address, with public Web servers having public Internet Protocol (IP) network address that are encoded under IPv4 using a 32-bit addressing scheme or under IPv6 using a 128-bit addressing scheme. The Domain Name System (DNS) is used to map Web site URLs to their public IP addresses. Typically, there is only a single public IP address for the home pages of sites such as www.facebook.com and www.youtube.com. In order to handle the millions of requests received daily, these sites implement a load-spreading scheme under which each request is routed internally in the sites' private networks using one or more levels of fan out.
Early load-spreading schemes employed load-balancers or the like that used simple algorithms for balancing the incoming request across multiple servers. For example, in view of the original HTTP request/response expectation, a round-robin scheme or the like was employed under which if there was a 1-to-n load balancer, each server would handle every nth request. Under architectures that couple lower-tier application servers to higher tier servers in a tree-like hierarchy, a given application server may only be accessed via a single routing path. Thus, for streaming connections in these architectures, all packets corresponding to the connection within the host's private network are routed along the same path.
In contrast to private IP networks, the Internet is comprised of a large number of inter-connected public networks, and employs a very large number of switching elements such as switches, routers, bridges, etc. Under a fundamental concept of the Internet, packets may be routed between the same source and destination endpoints using different routes, thus providing resiliency if some of the switching elements become disabled, and enabling dynamic changes to the network topology. However, for streaming connections and the like, it is advantageous to route packets along the same route, both over the public Internet portion of a route and the private network portion of the route.
It is often preferable for packets associated with a flow to arrive in order at the destination. To facilitate this, routers and switches are typically configured to choose the same next hop for a given flow. Similarly, it is usually preferable that load balancers and load splitters be configured to send packets belonging to the same flow to the same server.