With the advent of the World Wide Web (“WWW”), a universal client-server computing platform has emerged on the Internet. A very large number of web-servers on the Internet are serving web applications, which interact with web browsers acting as clients. A web application is typically organized into a hierarchy of webpages, scripted in Hypertext Markup Language (“HTML”) and/or Extensible Markup Language (“XML”.) It operates under the HyperText Transfer Protocol (“HTTP”.) A web application could itself be a suite of applications, including access and manipulation of databases, media and other resources hosted on one or more servers.
The resources provided by the servers are called up by their respective Uniform Resource Locator (“URL”). Generally, an URL will contain an IP address that points to a server followed by additional pointers to files residing on the server. In the case of a web application, a client browser can therefore access a webpage or a link by its URL. In particular, the browser typically first accesses the web application by its website address, which is a portal address of the application, calling up a homepage with links to the hierarchy of webpages. For example, a commercial entity may create an on-line shopping site, “www.onlineshop.com”, for customers to browse and purchase merchandise on the Internet. The domain name address “www.onlineshop.com” is an alias for the IP address that points to where the application resides on the Internet.
One problem this new computing paradigm presents is the need for the server hosting a web application to meet the potentially huge demand from the clients. The global nature of the Internet has meant that at any time there could be millions of clients attempting to access the same web application. A common solution is to host the web application in a data center.
FIG. 1 illustrates a data center hosting the web application by means of a server farm. Multiple replicas of the web application are made available from a group of servers, known collectively as a “server farm”. The data center provides a multiplicity of web servers and other related servers for hosting multiple copies of a web application and related resources. This architecture allows easy scaling of resource capacity to meet increased demand. When a client request comes in, a LAN/Web switch performs a load-balancing function by connect it to one of the less busy servers in the group.
The URL of the website for the web application now points to the LAN/Web switch so that when a browser addresses the web application, the client packets are initially directed to the LAN/Web switch. The LAN/Web switch then switches the packets to one of the less busy servers in the data center based on load-balancing considerations. The switching is done using information associated with Layers 2–4 of the Open system Interconnection (“OSI”) model, or the more specific Transmission Control Protocol/Internet Protocol (“TCP/IP”).
FIG. 2 is a table illustrating the protocol layers of the OSI model, the corresponding TCP/IP protocol stacks, and the types of conventional switching and routing operable at each layer. According to the OSI model, each device on a network implements the seven OSI layers in a modular fashion. Starting with Layer 7, which is a software application at the top, each layer communicates with its immediate layers. As the layers get lower, the information to be sent out is increasing packaged for the specific hardware of the device, ending in Layer 1, which is the physical communication channel itself. Under TCP/IP, Layers 1–2 are implemented as a subnet (or MAC) stack, Layer 3 as the Internet (or IP) stack, Layer 4 as the transport (or TCP/UDP) stack, and Layers 5–7 as the Application stack. Each stack is usually implemented by a software and hardware combination. Typically, data is generated by a first network device and is processed down the protocol stacks, from Layer 7 to Layer 1, into a packet, with each stack adding a header to the packet. The packet can then be sent via a physical channel to a second network device. The second network device processes the packet up the stacks starting from Layer 1, and unwraps the respective headers after terminating them at their associated stacks. At Layer 7, the application data of the first device is retrieved for interaction with the application of the second device.
FIG. 3 illustrates the various headers of an IP packet. Each IP packet consists of a data portion for carrying the data payload and a header portion for carrying overhead information. The header portion is further partitioned into layer- or protocol-dependent headers. For example, a Layer 2 or MAC header includes a destination MAC address and a source MAC address that respectively specify the destination and source hardware addresses of a node in a subnet. On a LAN, an IP packet is directed to a destination device by its destination MAC address. A Layer 3 or IP header includes a source IP address and a destination IP address that respectively specify the IP addresses of the source and destination nodes on the Internet. On the Internet, an IP packet is directed to a destination device by its destination IP address. A Layer 4 or TCP header includes a source TCP port and a destination TCP port that respectively specify the port numbers used by the source node and the destination node. On a device, an IP packet is directed to a destination port by its port number. In general, transporting a packet from one location to other requires processing of Layers 2–4 header information.
The data portion of the IP packet contains Layer 7 information, which is data generated by the application. In web applications, the data will include HTTP headers. Since HTTP is not one of the basic OSI or TCP protocols, but a High level protocol associated with web applications, its header is therefore regarded as application data and therefore located in the data portion of the IP packet. The HTTP header includes an URL field for specifying the URL the packet is requesting. It may also include a cookie field for the application to communicate environmental information with the client.
As mentioned earlier, each device communicating on the Internet implements the TCP/IP stacks. For example, when a client computer running a browser requests a web page from a server, the client packets are typically routed by a number of routers and possibly a web switch before reaching the destination server. When a router intercepts the packets, it is processed from Layer 1 up to Layer 3, so that Layer 3 information such as the source and destination IP addresses can be extracted in order for the router to route the packet to the next device. When the packets get to a conventional Web switch, the packets are only processed from Layer 1 up to Layer 4. In general, the upper layer information can only be obtained after the all the lower layer stacks have been processed or “terminated”. Thus, the upper or deeper layer information of an IP packet requires more processing to obtain.
FIG. 2 also lists the common types of routing and switching that can take place at the various layers. LAN switches, such as an Ethernet switch for a LAN operate at Layer 2 or the subnet stack of TCP/IP. Routers, operating at Layer 3 or the network layer, allow IP switching in which IP packets may be routed to a node having an IP address on the Internet. A router basically examines the destination IP address on a packet, looks up its routing table for the output port number in order to send the packet to the next node.
As mentioned earlier, a web switch is employed to switch an incoming client packet to one of many parallel web servers in a data center. In conventional implementations, its primary task is to perform a load-balancing function, i.e., to distribute an incoming packet to the least busy server among the server farm. This is accomplished by monitoring the load condition of each server, and dynamically changing the IP and MAC addresses of a packet so that the packet is directed to the least busy server.
In order to select the appropriate server, it is sometime necessary to consider the type of service being requested. For example, the data center may have a group of HTTP servers dedicated for web service; or a group of S-HTTP servers dedicated for secure web service; or a group of SMTP servers dedicated for Simple Mail Transfer Protocol service; or a group of FTP servers dedicated for File Transfer Protocol service. In that situation, it is necessary for a web switch to determine the type of service requested in order to select a server from the appropriate group. When the service is associated with a particular transport protocol, Layer 4 header information will be useful in helping to select an appropriate server.
FIG. 4 illustrates conventional TCP port assignments for some of the more standard services. The Layer 4 header of a packet contains the destination TCP port number. By convention if the destination TCP port number is 80, it can be assumed that the packet is associated with HTTP protocol and therefore a web application. Similarly, if the port number is 25, the service is assumed to be SMTP, or if the port number is 20, the service is assumed to be FTP, etc.
Thus, existing web switches are capable of switching an incoming packet to the most appropriate server, based on server load conditions and/or Layer 4's transport layer information. More recently, there have even been suggestions of more refined load balancing based on Layer 7, or application layer information, taking into account information derived from the HTTP header, such as URL and cookie.
Also, there have been suggestions of making web switches capable of switching with some notion of Quality-of-Service (“QoS”). This is in view of certain applications, such as those involving Voice-Over-IP (“VoIP”) service under the H.323 standard, requiring a minimum standard for connection stability, low latency and bandwidth. The suggested solution is to provide dedicate VoIP servers that can provide the necessarily quality of service for this purpose, and to have the web switch recognize VoIP packets through Layer 4 information in order to switch them to the VoIP servers. Thus QoS is provided essentially by switching the packets associated with a special application to a server dedicated to serving such special application.
Generally, load-balancing schemes enable the capacity of a website to be scaled to meet demand, and existing QoS schemes allow switching to a dedicated server that can provide the required quality of service. However, due to the enormous number of clients on the Iternet that could potentially access a website, there will be times when the budgeted capacity of a website is exceeded. This is especially the case during certain peak times when a website could experience spikes in demand. For example, an online merchandising website could be especially busy during holiday seasons when the demand could increase by orders of magnitude. Under those peak demand circumstances, no amount of load balancing will suffice since all the available servers in the server farm are already fully committed. When additional requests arrive, the web switch can only make a best effort attempt to deliver the packets to the saturated servers.
As the servers becomes busier with more requests, the quality of service decreases as a nonlinear functions of the number of requests. With existing web switch capabilities, once the server farm is saturated, the quality of service deteriorates drastically for all clients accessing the website. There is no provision for distinguishing clients of differing importance or to accord clients of high importance preferential access. Nor is there provision for ensuring sufficient server headroom so that clients of high importance will be served on demand. For example, this would be of significance for an online merchandising website during holiday seasons when excessive traffic may reduce the website to a crawl or render it totally incapacitated. It would be desirable to give preferred customers preferential access. A preferred customer may be one who is on a shopping cart page as compared to a regular customer who is merely browsing the catalog. Conversely, when certain client packets are deemed less preferential, it would be desirable to have a way to identify them and accord them with the appropriate quality of service, of lack thereof. In security applications, it would be desirable to be able to identify those packets that are “packet non-grata” and have the switch direct them elsewhere or to drop them altogether.