Much of the Internet is dedicated to the World Wide Web, a system of data communication featuring visual pages of information, known as Web pages, displayed on computers pursuant to a request from a user. Web pages are created in hypertext markup language, more frequently known by the acronym “HTML,” as well as related high level page description languages. These languages use ordinary text to describe Web pages, with the text being transmitted from a server over the Internet to programs running on users' computers known as “browsers” that interpret the HTML code and create corresponding Web pages for viewing. The downloading of Web pages by users, also called clients, from servers takes considerable time at low bandwidths, common to most homes, and takes noticeable amounts of time at bandwidths used by many businesses. This delay is due to a combination of factors: slow servers, modem and network latency, and the bandwidth of the communication pipe. There is considerable ongoing effort to expand Internet bandwidth so that Web pages and associated files can be transmitted more rapidly.
Part of the reason for network bottlenecks is due to Web pages containing dynamic content, content that is created “on the fly.” While dynamic content (for example, stock quotes or breaking news stories) on pages may represent only a small proportion of the page's content, the entire page must be transmitted every time a user requests the page. If a user requests a page repeatedly over a short period of time, for instance if the user is tracking a certain stock's activity and requests a page with the stock quote five times in an hour, this page must be assembled and transmitted to the user for each request. This burdens both the server, which has to create the page, and the network which transmits the information from the user to the server.
The idea of expanding Internet bandwidth by data compression is known. Programs, such as GZIP, ZIP and LZP, exist for file compression. Picture and video file compression exist under standards such as MPEG and JPEG.
LZW is another file compression scheme. A file is compressed using a table-based lookup algorithm. LZW is suitable for text compression as well as image compression, which can produce GIF and TIFF image formats. A sample LZW compression algorithm works as follows. An input sequence of bits of a given length as well as a shorter code associated with that sequence is entered into a table. If, as more input is read, a particular sequence is repeated, the shorter code is substituted, thereby achieving compression of the file. The look-up table is included with the compressed file for decoding purposes. The transmission of the look-up table with the compressed file is inefficient since it requires the use of bandwidth in excess of what would be required to transmit the file alone.
Recently, computer scientists have realized that there could be compression of Internet data by observing sequences of data bits and assigning unique labels to these sequences. Peribit Networks, Inc. of Santa Clara, Calif., recently introduced a commercial product which is reported to use pattern-recognition algorithms that were used at Stanford University by Dr. Amit Singh to capture recurring sequences of base pairs in DNA for subsequent analysis. Applying the algorithm to data traffic, Peribit's software spots repetitive patterns in data packets and assigns labels to those patterns. The benefit is that by substituting the labels for repeating data packets, overall Internet traffic loads are claimed to be reduced by as much as 70%, perhaps more. The new compression scheme resembles other data-compression schemes, such as those used to create ZIP and LZP files where a token is inserted wherever there are repetitive strings of data. When decompressed, the tokens are expanded back into the original strings.
Most file compression schemes work within a defined range of a certain number of bytes of information. In contrast, Peribit's algorithms scour streaming WAN packet streams over time without such a restriction. The Peribit software eliminates the file packing and unpacking associated with traditional compression. While the effort by Peribit is commendable, it is computationally expensive and requires purchase of computer hardware for both the server and client. Peribit is a point-to-point solution that is not suitable for applications such as web serving where there are millions of clients, none of which have the Peribit hardware. In addition, like LZW compression, Peribit transmits the token table with the compressed file and therefore has the same inefficiency as LZW compression.
Mun Choon Chan and Thomas Y. C. Woo's paper “Cache-based Compaction: A New Technique for Optimizing Web Transfer” proposes a new technique to reducing Web latency over a slow link. Chan and Woo argue that Web page service latency can be reduced when similar objects (e.g., Web pages having the same or similar URLs) that have been requested and transmitted to the requester are used as references. If a requesting client has an older version of the requested page in its cache, only the changes, or deltas, in the current page need to be sent to the client. Although this paper discusses general approaches to the concept of cache-based compaction, no specific implementations were discussed.
Fourelle Systems, Inc. markets a bandwidth optimization product called Venturi. Venturi uses a collection of standard and proprietary algorithms to compress HTTP, HTML, POP3, SMTP, FTP and NNTP data. Fourelle's product determines the type of data being transmitted and applies the most appropriate compression methods at the application layer. U.S. Pat. No. 6,115,385, assigned to Fourelle, provides a gateway architecture which converts the native protocols of the client application and the server to a bandwidth-efficient protocol. The compression that can be achieved using this approach is limited to the type of algorithm called for each type of data. In other words, maximum compression may not be achieved for certain data types.
Vigos AG uses a combination of hardware and software for their Vigos Website Accelerator. The Accelerator sits at the Web server and runs as a reverse proxy. It uses standardized compression algorithms to reduce data volume by about 10 percent.
Cennoid Technologies offers FxP Compression. This compression approach, based in part on U.S. Pat. No. 5,949,355, “Method and Apparatus for Adaptive Data Compression,” chooses a compression scheme based on the type of data to be compressed. Repeated blocks of characters are encoded while other characters are eliminated. The compression data engine also remembers identical packets of data which have already been compressed.
wwWhoosh Inc. uses proprietary players, incorporated into a user's browser, and servers to accelerate delivery of Web content. The player acts as an Internet proxy and accelerates browser performance. The player also determines whether a requested URL is “wwWhoosh-enabled,” i.e., whether the proprietary server has repackaged the content associated with the URL so that it is more efficiently delivered. This approach achieves a compression rate of about 15% per requested page.
Netscaler offers a hardware solution to latency reduction. Cache redirectors are deployed at either the edge of the network or at a content server. The cache redirector sets up persistent connections between content and cache servers and filters out non-cacheable requests (i.e., requests for dynamic content) which cannot be answered by a cache server), which are sent to the content server. Other requests are fulfilled by the cache server.
FineGround Networks has also released products for achieving content acceleration. FineGround's approach is to transmit only the changes to a Web page that have occurred between successive requests from a particular user for the same Web page. FineGround's software is installed at the content provider between the content server and the Internet. This software must keep track of pages in the user's cache by cookie, i.e., if the user has the “base” page to which modifications are made by the material sent by FineGround. The Web page is assembled by Javascript contained in the page sent out by FineGround. FineGround's solution to content acceleration only comes into play when the user requests a page he or she has visited before. This approach requires sending the entire base page if a user does not already have it. It also requires manual tuning to inform the software which pages are similar on the server. Cookies must be enabled on the user's browser and the browser must also support DHTML. The page name must also remain the same on subsequent loads because the delta differencing is based on the page name.
Fireclick, Inc. uses differential caching and predictive caching to reduce Web page service latency. Differential caching breaks each page down into dynamic and static portions. The static portions are templates, and are usually stored in a user's browser (although they are transmitted the first time a user accesses a page) and the dynamic portions only are transmitted each time a user requests a different page. The user receives the dynamic content, a pointer to the cached template in the browser's cache, and instructions for assembling the page. Javascript in the browser interprets the instructions and assembles the page. Latency is further reduced by predictive caching, in which a user's browsing patterns are analyzed and the templates for pages the user is likely to request are “predownloaded” (i.e., sent to the user's browser's cache before the user actually requests the page) to the user's browser. As with FineGround, this approach requires “knowing” what is in the user's cache and requires that a page name must remain the same on subsequent loads since this is the mechanism used to determine which template to select.
Another approach for reducing Web page service latency is hierarchical caching. Information can be cached at several points in the network. Requests are routed to certain caches; as a rule, the request first checks a local cache, then a more distant, larger cache is checked, etc. Routing of requests is normally independent of the item sought—the same caches are checked each time regardless of what is sought. Drawbacks to this approach include difficulties configuring neighbor caches as well as potential problems with network congestion as more and more caches are consulted which may increase any latency associated with using the hierarchical cache approach.
Bang Networks has developed a service to serve real-time information over the Internet without having to refresh Web pages. Bang Networks uses a network of proprietary routers which maintain persistent connections to browsers. The routers store information about user sessions and information flows and use that stored information to route data. Bang customers, or content providers, feed real-time information to the proprietary network and the information is routed through the network to the customers' Web pages. In order to use the service, content providers must modify HTML tags in their documents. As this network “scales” to reach more users, this solution becomes extremely expensive and, as noted above, requires content providers to modify their content.
Another approach to reducing Web page service latency and network burden is Edge Side Includes (ESI), developed by Akamai Technologies, Inc. and Oracle Corporation. ESI is a markup language that describes cacheable and non-cacheable components of Web pages. By using this approach, only non-cacheable components of a Web page need be fetched from the Web site; the cacheable components may be stored at the edge of the network. In order to use this approach, a content provider must rewrite its content to be compatible with ESI and send static data to the service provider.
Most information is transmitted over the Internet in TCP/IP packets. This can be inefficient since numerous round trips are required to open and close each TCP connection. Although HTTP 1.1 now allows for persistent connections, persistent connections are not available to all servers. Additionally, HTTP persistent connections do not support simultaneous requests of inlined objects, which is part of most Web pages. The objects are typically requested one at a time by the browser.
A multiplexing protocol, SMUX Protocol Specification, WD-mux-19980710, has been proposed by W3C which allows multiple objects to be fetched from a Web server approximately simultaneously over a single TCP connection. A TCP connection is multiplexed underneath HTTP. This approach allows sockets to be multiplexed on one socket. However, each individual message is often packaged in its own TCP/IP packet, so small packets may still be transmitted.
The transmission of short TCP/IP packets also creates service latency. Each message or file that is transmitted over a TCP/IP connection is contained in a TCP/IP packet with a 40 byte header. If each message, no matter how short, is sent in a TCP/IP packet, this wastes bandwidth as well as the time to send each TCP/IP packet. For instance, if a 1 byte message is sent via a TCP/IP packet, 41 bytes of data are transmitted for every 1 byte of useful data. If the network is heavily loaded, the congestion resulting from transmission of these small packets can result in lost datagrams, which requires the packet to be retransmitted. In addition, the network is slowed by this congestion and connections may be aborted.
One potential solution to this problem, proposed in IETF RFC 896 by Nagle et al., is to delay sending new TCP segments when new data arrives from a user if any previously transmitted data on the connection remains unacknowledged. Basically, the packet is held to accumulate data. When the acknowledgment for previously transmitted data arrives, or if a packet is filled, the packet is transmitted. However, many short packets may still be transmitted using this algorithm.
An object of this invention is to improve the speed of data communication in a network by minimizing the bandwidth needed and reducing communication latency. Unlike any of the inventions of the prior art, the current invention can achieve compression factors of 50 times or more on real-world dynamically generated web pages and achieves minimum latency with minimum overall system loading by utilizing various technologies such as caching relay hubs, persistent connections between all computers, asynchronous protocols, and re-packaging small TCP requests into a single packet.