The Internet has, of late, become extremely popular. The origins of the Internet date back several decades to a U.S. government sponsored military/research/business wide area network (WAN) that was designed to remain operational even in the event of the catastrophe, e.g. a major earthquake or a nuclear war. To accomplish this goal, robust protocols and systems were developed which allowed a geographically distributed collection of computer systems to be connected as a WAN such that the loss of a particular computer, or group of computers, would not preclude the continued communication among the remaining computers.
While the use of the Internet has been prevalent for many years now, its use has been limited by the arcane and difficult commands required to access the various computers on the network. To address this problem, a protocol known as the "World Wide Web" or "WWW" was developed to provide an easier and more user-friendly interface for the Internet. With the World Wide Web, an entity having a domain name creates a "web page" or "page" which can provide information and, to a limited degree, some interactivity.
A computer user can "browse", i.e. navigate around, the WWW by utilizing a suitable web browser and a network gateway (e.g., an Internet Service Provider (ISP)). For example, UUNET, America Online, and Global Village all provide Internet access. Currently, the most popular web browser, known as the Netscape.RTM. Navigator.RTM. is made by Netscape Corporation of Mountain View, Calif. The web browser allows a user to specify or search for a web page on the WWW, and then retrieves and displays web pages on the user's computer screen.
The Internet is based upon a transmission protocol known as "Transmission Control Protocol/Internet Protocol" (or "TCP/IP" for short), which sends packets of data between a host machine, e.g. a server computer on the Internet, and a client machine, e.g. a user's personal computer connected to the Internet. The WWW is an Internet interface protocol which is supported by the same TCP/IP transmission protocol. Intranets are private networks based upon Internet standards, and have become quite common for managing information and communications within an organization. Intranets, since they subscribe to Internet standards, can use the same web browser and web server software as used on the Internet.
With a rapid proliferation of the Internet and Intranets, much attention has been given to performance issues. In particular, the issue of "bandwidth", i.e. the rate at which data can be moved within the network system, has been the focus of a considerable amount of research and development. As an example, many users still connect to the Internet through a 28.8 Kbit/second modem. This can make the "downloading" of large amounts of information (e.g., photographs, graphics, video, and audio) painfully slow. There are times when a complex web page, including a number of high resolution images or other objects, can take several minutes to download from the host machine to the client machine.
The prior art addresses bandwidth limitations in Internet and Intranets by, essentially, increasing bandwidth. This can be accomplished through higher speed data transmission and through compression techniques. For example, instead of using standard analog telephone lines, much faster digital ISDN or T1 lines can be used. However, these faster digital telephone lines can be quite expensive. Higher speed modems, such as cable modems, are also under development. In addition, compression techniques can make a standard 28.8 Kbit modem appear to transmit data at twice that speed.
While increasing bandwidth, improving data compression, etc. has been helpful in improving Internet and Intranet performance, other performance-robbing characteristics of Internet/Intranet performance have only been partially addressed. One example is the "latency" problem where TCP/IP packets are routed through a number of routers, and perhaps servers or other devices (collectively referred to herein as "nodes") on their journey between the host machine and the client machine, where each node adds its own delay. Another example is the "connect" problem wherein each connection between host machine and client machine introduces a sometimes considerable delay.
In FIG. 1, a simplified representation of an Internet 10 and a client machine 12 is shown. Client machines are typically personal computers made to the IBM-PC standards and running a Microsoft Windows operating system, Apple Macintosh computer systems, or work stations made by Sun Microsystems, Hewlett Packard Company, and the like. Client machine 12 is coupled to the Internet 10 by a data link 14, such as an analog or digital telephone line (with appropriate modem and/or other interface). The client typically makes its initial connection with an Internet Service Provider (ISP) 16A, which is connected to the Internet 10 with one or more data links 18. The ISP 16A is one form of "node" 16. Nodes on the Internet comprise computers of various sizes and types, although they mostly tend to run some variant of the Unix operating system. There are nodes on the Internet that are personal computers, workstations, minicomputers, and mainframe computers and specialized computers known as routers and switches. A host computer 16B (which is another form of node 16) resides somewhere within the Internet 10, and may be directly coupled to the ISP 16A or may be coupled to ISP 16A via a number of data links 18 and nodes 16. The various nodes are computers that can be used to route TCP/IP packages towards their final destination. Intranets are designed in a similar fashion as the illustrated Internet 10.
A typical web browsing session is as follows, whether on the Internet, on an Intranet, or on a combination of the two. A user of a client machine 12, such as a personal computer, starts a web browser application program. The manufacture and use of computers, such as client machine 12 and host machine 16B are well known to those skilled in the art.
As mentioned previously, the Netscape Navigator web browser is currently the most popular web browser. The browser is used to connect the client machine 12 to the Internet 10 via the ISP 16A. The client machine and the ISP 16A communicate with each other by means of the aforementioned well-known TCP/IP protocol. When the ISP 16A detects a connection request from the client machine 12 in the form of a "Uniform Resource Locator" or URL 20, the connection request is routed by the Internet 10 to the host 16B providing that URL. The host 16B receives the connection request and the URL of the desired page, and transmits the page to the ISP 16A in TCP/IP packets. The ISP then sends a page 22 in Hyper-Text Mark-Up Language (HTML) to the client 12. Most web browser software cache recent pages in an associated hard disk 24 so that if the same URL is requested in the future, the data will be quickly and locally available to the client computer 12.
The connection process to a host machine on the Internet or Intranet can be quite time consuming. For example, a busy Internet site can take several seconds or even minutes to create a connection with a client machine, particularly during peak traffic times. This is due, in part, to the inherent latency of the connection process through the Internet but is more often due to the relative slowness of the host machine to respond to a connection request. This is because the host machine can only respond to a finite number of simultaneous requests for connections from client machines.
To partially address this connection delay problem, most web browsers, as mentioned previously, allow for caching of recently accessed pages on the hard disk 24 of the personal computer. For example, with reference to FIG. 1, the client machine 12 will "cache" the web page 22 on its hard disk 24 so that if a subsequent request is made for the web page 22, it will be immediately and locally available.
The connection and caching process of the prior art will be discussed in greater detail with reference to FIGS. 2, 3a, and 3b. In FIG. 2, a web page 22 is written in HTML code 26, which is a declarative, high-level computer language. A browser parses and interprets the HTML code 26 to generate the desired image on the computer screen or other page effect. The HTML code 26 begins with a "start of file" code 28, namely "&lt;HTML&gt;." The HTML 26 ends with an end of file ("eof") code 30, namely "&lt;/HTML&gt;." The body of the HTML code 26 includes a number of HTML commands including the image commands "IMG" 32.
In FIG. 3a, a prior art browser "Acquire URL" process 36 is illustrated. The process 36 can be initiated ("evoked") by, for example, providing the URL to the web browser, by using a web browser "bookmark," etc. Once the Acquire URL process 36 has begun, it is determined whether the requested web page is in the cache of the client machine in a decision step 38. If the web page corresponding to the URL is in the cache, it is fetched in a step 40. Again, the cache is typically stored on a hard disk 24 connected to the client machine 12. Then, a "Process Page" process 42 is evoked, after which the process 36 is completed as indicated at 44. The process 42 will be described in greater detail subsequently.
If the desired page is not in the cache as determined by step 38, a connection to the appropriate host machine is opened in a step 46. This connection goes through the Internet service provider 16a, through an indeterminate number of Internet nodes 16, before connecting with the host 16b. The page corresponding to the URL is then is retrieved byte-by-byte from the host 16b in a step 48. If an eof is detected in a step 49, the connection between the client machine and the host machine 16b is closed in a step 50 and the process is completed at step 44. Otherwise, the page bytes are processed by the process page process 42 as they arrive in step 48. Alternatively, and more simply, the process can be thought of as opening a connection, retrieving a page, processing the page, and closing the connection.
The Process Page process 42 of FIG. 3a is illustrated in greater detail in FIG. 3b. The process 42 retrieves, in a step 54, the next HTML component to be processed. As used herein, an "HTML component" is a piece of HTML code that is parsed and interpreted by the web browser to control the image and other page effects displayed upon the client machine 12, or to provide another web browser control function. Next, in a step 56, if the end of file ("eof") is encountered, the process is completed as indicated at 58. If an eof has not been encountered, it is determined whether there is an embedded URL in the HTML component. If so, the process 36 "Acquire URL" is evoked. If there is not an embedded URL, or after the URL has been acquired by process 36, the HTML component is processed in a step 62. After the completion of the step 62, process control is returned to step 54 to retrieve the next HTML component of the page 22.
In should be noted that the processes 36 and 42 are recursive in that process 36 can call process 42 and process 42 can call process 36. Therefore, there can be several nested layers of URL acquisitions, each of which would require a time-consuming connection between the client machine 12 and the host machine 16b.
In FIG. 4, prior art "server" software running on, for example, host machine 16b is illustrated. In a step 66, the process 64 waits for a connection request and URL from a client machine. Of course, other processes may be running on the host machine while waiting for a connection request and a URL. Once a connection request and a URL have been received by the host machine, a step 68 returns the requested page and the connection is closed in a step 70. Process control is then returned to step 66.
From the foregoing discussion, it will be apparent that the retrieval of a web page can result in multiple connections being made between a client computer 12 and a host 16b. For example, if a web page includes a dozen distinct images, at least a dozen connections to the host computer 16b must be made. If the host 16b is busy, causing a server responsiveness delay, or if there is high network traffic on the Internet 10, causing a latency delay, very poor performance can be experienced by the user of the client machine.