The Internet is a large scale wide-area network connecting a rapidly growing number of sites. The Internet consists of a communications protocol and addressing scheme allowing any two computers on the Internet to communicate with each other. This backbone is implemented by several layers allowing specific types of communication between a wide variety of systems. File Transfer Protocol (ftp) capable Internet sites respond to a particular type of communications request by exposing a list of files and directories. Hyper Text Transfer Protocol (http) capable Internet sites provide access to a specific document which contains text formatted with predefined formatting commands which lay out the format of the text and includes pointers to other http documents or graphic images.
Several other protocols exists, and others may develop as the Internet grows. The present invention is directed to ftp and http sites, but is intended to handle other protocols as well. The term xe2x80x9cWorld Wide Webxe2x80x9d, or simply xe2x80x9cWebxe2x80x9d, refers collectively to the collection of Internet sites responding to these protocols, and specifically to the collection of http-compatible sites.
In order to access the Internet, a user must have access to a direct or indirect connection to the Internet hardware backbone. The backbone is provided by a number of large Internet sites at private service provider companies, universities and government institutions. These sites accept data destined for some specific Internet address and route the data packet to the destination in-accordance with a predefined routing protocol.
Smaller Internet sites can be part of the routing mechanism, thus providing a hierarchical network of continuously decreasing bandwidth. At the bottom end, individual end-users access a local site through a modem or other connection. The individual end-user""s computer is then considered to have its own Internet address and data can be xe2x80x9croutedxe2x80x9d to the end-user""s computer. Small networks can also be connected to the Internet through several methods, allowing all users on a small network to access the Internet.
Computers connected to the Internet can be xe2x80x9cServersxe2x80x9d which generally respond to ftp or http requests, and/or xe2x80x9cClientsxe2x80x9d or xe2x80x9cBrowsersxe2x80x9d which primarily let users access information provided by a Server. Such Browser software requests a site address from the user, and accesses the site, presenting the user with whatever information is made available by the Server at the site the user selected.
FIG. 1 shows the general operation of an Internet Browser. In 101 the user requests access to a particular site (either directly or by defining a xe2x80x9chomexe2x80x9d site that is always displayed first). In 102 the Browser looks up the Internet protocol (ip) address of the requested data and contacts the remote sites. In 103 the data is requested. It is retrieved at 104 and presented to the user at 105.
For ease of use, users are not required to enter the actual numeric Internet address (or ip address) of the site which they are interested in contacting. Instead, the Internet contains, at various service provider locations, Domain Name Servers (DNSs). These DNSs contain databases of Internet addresses and names of the sites that provide the resource associated with the name. These names stored in the DNS databases conform to a specification called the Uniform Resource Location (URL) specification. Thus, the user need only know the URL name of the site they wish to access, and the Browser software will search a local DNS for the actual address of the site so named.
A known method (see Mosaic Web Browser or Netscape version 1.0) includes the ability for caching documents that have been accessed previously, whether in the same or a previous session. (The CompuServe Interface Manager for Windows, WinCIM, also provides a cache of certain bitmaps and documents). This method allows subsequent accesses to the same document, and may, prior to retrieving a document, check the local cache to see if the document exists locally and if it has been modified since it was copied locally. FIG. 2 diagrams the operation of this method. In 201 the user indicates which site to contact, as usual. In 202 the site is contacted; the Browser determines whether the data has been modified via a last modified time stamp, checksum or other procedure. In 203 the Browser determines whether the modified version of the data is available locally. If not, the data is requested (204), retrieved (205) and copied into the local cache (206). The appropriate data, either the local or retrieved copy, is then displayed (207).
This is a useful methodology and can be implemented together with the present invention. This technique, however, only provides faster access to documents that have been accessed previously and, have not subsequently changed. Thus, this method is unable to provide an improvement in accessing data that changes frequently. Also, this technique, if it is applied across multiple sessions, tends to consume large quantities of local storage for the document cache. In order to provide any improvement, the local storage for cross-session access must be sufficient to provide a meaningful portion of what the user may see, and thus the method is not generally practical.
In another prior art method (see Dr. Dobb""s Journal April 1996, The Harvest Object Cache and references therein) the problem of speeding up the overall response time of the Internet and reducing the load on the Internet is tackled. This is accomplished via distributed caching at the local or regional network level.
FIG. 5 is a simplified diagram of this system. Local networks 503 are connected to a regional network 502 which in turn is connected to a wide area network backbone 501. A typical data request (506) is initiated at a local workstation, and flows through a local regional network to the network backbone and then to another regional network and finally to another local network. The response would follow the reverse path back to the user.
In this method, cache storage 504 is added at various points in the network, and frequently-accessed documents are kept in duplicate in the cache storage and accessed through data path 505. The response to a data request is thus quicker when the requested data can be retrieved from the cache 504, as the request need not be passed down to the appropriate local network and back up again.
This is a useful and laudable goal. However, the end-user is interested in the apparent performance of the end-user""s own workstation, and in the time spent actually connected to the Internet. So, the actual response time of the Internet is less material if it seems to the user that the Internet response was instantaneous. In other words, the time it took the user""s workstation to access the information is immaterial provided it appears as soon as it is requested by the user. Furthermore, since the user is charged for connect time rather than message units, the user is concerned with the overall time spent logged on to the Internet rather than the amount of time spent accessing particular objects.
This prior art caching methodology does not satisfy three needs: First, it is of little assistance in the accessing of documents that change frequently (for instance, newspaper front pages) or to documents that are accessed infrequently or have never been accessed previously, as such documents would not be in a cache. Second, this method can be difficult to deploy, requiring some cooperation between the regional networks, and increases the cost of providing network services. In addition, this method does not improve access speed if a low bandwidth bottleneck exists between the user""s workstation and the cache which contains the requested data.
Because of the varying bandwidths of the various sites on the Internet and because of the indefinite nature of the connection between servers and browsers, there can be a varying degree of lag time between an access request and a response. The combination of several such lags can result m insignificant delay in response to a user""s request.
A third prior art method (see Microsoft Internet Browser for Windows 95) allows the user to take manual advantage of a slow-to-respond server by creating a separate, simultaneous network access. FIG. 3 diagrams this multi-session Internet access. The user initiates a first contact with a first site at 301, and the Browser behaves at 302-307 as described above for a normal cached network Browser. In this method, the user can, upon determining that response from the first contact is slow, initiate a second, independent network Browser session at 308, contacting some other site or data set. This causes the Browser to effectively clone the first session at 309 and provide a second session at 310-315.
This method does not provide any increased performance or perceived performance within any one session. This method also requires that the user manually select the next site to contact, and so does not take advantage of the slow response time of the user. This method also does not provide any quicker response to subsequent accesses to a single site, and requires that the user manage multiple sites, an often confusing task.
A fourth prior art method is demonstrated in the read-ahead caching functionality of the SmartDrive disk cache found in the MS-DOS operating system and in similar disk caching schemes such as Symantec Corp""s NCACHE and Microsoft""s VFAT found in Windows 95. This prior art, diagramed in FIG. 4, accesses a data set sequentially, taking advantage of the delay in retrieving the next sequential part of the data. Thus, the user or user program can begin to process the first item read in while the system reads the next part of the data. A data request at 401 is checked at 402 to determine whether the data has been cached. If so, it is accessed at 403. If it is not available a read request is initiated. In 406 the first part of the data is accessed. In 407 a background procedure is created to access more of the data. In 404 the next sequential data item (either physically on the disk in SmartDrive system or linearly in the dataset in Microsoft""s VFAT) is read in the background. The read data is then copied locally at 405.
This prior art method is not appropriate for an Internet Browser, because the amount of time required to display or process a data item retrieved is very small in comparison to the amount of time required to actually retrieve the data item. Thus, accessing the next sequential pat of the same data set in the background provides no improvement. Furthermore, the data is usually requested all at once and so there is no point in which a request for a data item is not real-time. Finally, to a certain extent this kind of traditional read-ahead is automatically a part of a traditional browser implementation, since the data is usually displayed by a process separate from the one that is performing the I/O and so a de facto read-ahead on the entire data set is performed.
It is therefore a primary objective of the present invention to reduce the perceived delay in response to a user""s request for remote information.
It is a further objective of the present invention to reduce the perceived delay in response without requiring dedicated local storage for inter-session caching.
It is a further objective of the present invention to reduce the perceived delay in response even for documents that are being accessed by the workstation for the first time, or have changed since they were accessed last.
It is a further objective of the present invention to allow the user to take advantage of the delayed response time of some sites, without having to manually manage accesses to multiple sites.
The present invention is an Internet Browser, which can be implemented either as an adjunct to existing Browsers or as a stand-alone, fully functional Browser. The present invention functions by multi-threading or directing requests to multiple sites and by accessing all documents from a given site as soon as their address is known. The present invention may also include user determined, both heuristically through monitoring user preferences and through simple settings, preferences as to which addresses receive access priority.
The present invention monitors each HTML page or other document currently being accessed, and analyzes the references within the document. It immediately and concurrently accesses the secondary documents referenced. Thus, a document referencing several other documents is read, causing the immediate reference of the secondary documents identified therein and the commencement of their transmission before or while the original document is being presented to the user. While the user examines the original document, the other documents will have already been retrieved, or will be in the process of being retrieved, to the user""s system, providing instant access as soon as the user selects the appropriate document. Such a procedure takes advantage both of the differing response time from different sites, and of xe2x80x9cdown timexe2x80x9d caused by the user""s relatively slow ability to assimilate the information as its presented.