Modern Internet browsers limit the number of concurrent TCP connections that can be opened to a given host or domain to a predetermined limit. In many browsers, this limit has been set to six concurrent connections. One reason for configuring browsers to enforce such a limit is to reduce load on servers, which traditionally have had a difficult time managing large numbers of simultaneous TCP connections. Traditional process-based or thread-based servers, under heavy connection load, devote significant processor resources to polling each connection to determine whether traffic has arrived on that connection. If the load becomes too heavy, servers may experience a phenomenon referred to as response throttling, as the servers become unable to respond to requests since their processors are overly taxed polling connections. The requesting clients see the server as unresponsive, and requests often “time out” as the server fails to respond to them.
In recent years event-based web servers, multiplexing Server Load Balancers, and Application Delivery Controllers have become widely available. These devices can handle tens of thousands or hundreds of thousands of TCP connections, meaning the servers no longer have to be “protected” from the clients or may operated with less protection from the clients. However, as web pages become more complicated, these browser imposed limits on concurrent TCP connections can undesirably impair browser performance as perceived by the end user. Many modern web pages are rendered based upon an HTML file that references numerous separately downloadable objects. It is not uncommon for 50 to 100 objects served by the same domain to be referenced by a single HTML file, and the browser must download each object to properly or fully render the web page. A browser that has received such an HTML file parses the file and begins generating requests for the objects referenced therein. However, requests for these objects must be sent over the limited number of (e.g. 6) concurrent TCP connections in a sequential manner. After the first six requests are sent, using the most common limit as an example, subsequent requests must be queued by the browser until the responses have been received. This drastically underutilizes available bandwidth, and leads to longer-than-necessary download time.
Requests are sent serially over each connection, such that a next request cannot be sent until a prior request has been fully responded to by the server. Large objects that are requested early can thus delay downloading of later queued object requests. Packet loss and network congestion on one or more of the connections can further delay data transmission, as the TCP protocol automatically slows down data transfer rates and slowly builds it up again for the congested connection. Further, the rendering of the web page often cannot begin until a set of objects necessary to determine its layout and functionality (e.g., cascading style sheets and embedded scripts) are downloaded. If these objects end up or occur at the end of the queue, the web page may take even longer to render. These various factors result in agonizing delay for users, who stare at incomplete or blank browser pages as the browser churns and waits, attempting to retrieve the various objects necessary to render the web page over the limited number of connections. If the wait is too long, users may give up and move on to a different web page.
Websites with who care about end user performance, and/or are equipped with event based servers or scalable application controllers will want to increase the number of concurrent TCP connections made by clients so that more objects downloads can be parallelized, thus reducing the overall “clock time” from initial page request to completed rendering.