1. Field of the Invention
The invention relates to a technique, specifically apparatus and accompanying methods for use therein, that optimally, through continual computation, uses available computer resources including but not limited to periods of low processing and low network activity, such as idle time, for prefetching web pages, or pre-selected portions thereof, into local cache of a client computer. This technique, particularly though not exclusively suited for use in a web browser, utilizes, e.g., a probabilistic or statistical user model to specify, at any one time, those pages that are to be prefetched given information including, e.g., a web page currently being rendered to a user, content and structure of that particular web page, a history of web pages visited by the user, user background, and user actions. In addition, advantageously, this technique prematurely terminates a current information download for the user in favor of prefetching a web page of future interest to that user whenever the latter page exhibits greater incremental utility to the user than does continuing the current download.
2. Description of the Prior Art
Currently, Internet usage, and particularly that of the World Wide Web (henceforth referred to as simply the "web"), is growing explosively, particularly as the number of web sites and users that have access to the Internet continues to rapidly expand.
In essence, after establishing a suitable network connection the Internet, a user can easily employ a graphical web browser, such as the Internet Explorer ("IE") browser presently available from Microsoft Corporation of Redmond, Washington, to connect to a web site by simply supplying an address (known as a URL or uniform resource locator). The URL identifies both the location of the site and a page of information at that site. Each web site stores at least one, and often times substantially more pages all arranged in a pre-defined hierarchy. Pages, in this context, refer to content accessed via a URL, including, e.g., text, graphics, and other information. Once a user supplies a URL, the browser sends an appropriate command to the site storing that page to access and download that page; the site then sends a file containing information for that page. As the file is received by the browser, the browser assembles and displays the page on a monitor for the client computer. Once the content associated with the page is fully or sufficiently rendered, the user can then point his(her) mouse to a suitable hypertext link, button or other suitable user input field (whichever here implements a "hot-link") displayed on that page and then, through, e.g., a mouse "click", effectively download and display another desired page in succession until the user has finished his(her) visit to that site. A hot-link specifies an address of an associated page, regardless of the web site at which that page is situated. Consequently, by simply and successively pointing and "clicking" his(her) mouse at an appropriate hot-link for each one of a number of desired web pages, the user can readily retrieve each desired page in succession from its corresponding web site and effortlessly jump from site to site, regardless of where those sites are physically located.
While a considerable amount of information can be downloaded from a web site for display by a browser, in practice, various factors exist which retard the speed at which the content from a page or from successive pages can be displayed on a client computer--often to the frustration of a user situated there.
One such factor lies with the nature of traditional personal computing itself. Specifically, personal computing applications, such as web browsers, word processors and spreadsheets, heavily rely on continual interactivity between the user and the computer. Inasmuch as a human being provides information, i.e. an entry, into a personal computer at a substantially slower rate than a rate at which the computer is able to accept and process the entry, computing activity in a personal computer is typically characterized by bursts of user input associated with relatively high processing or network activity, during which a user entry is being actively processed or information is being communicated, interspersed with intervals, usually considerably much longer, of relatively low activity during which the computer waits for another user entry. Oftentimes, during the latter intervals, relatively low priority tasks, such as background or overhead processing of one form or another, execute or, if no such tasks then exist, the computer simply resides in an idle state pending receipt of a user entry. Hence, some degree of available processing or networking capacity may either exist and not be used by virtue of the computer or network simply idling, or be allocated to relatively unimportant (i.e. low priority) tasks that could readily be deferred so as to allocate computational effort to potential future tasks. In the case of executing a web browser, the amount of time a personal computer and network spends in processing and communicating a user request, such as entry of a URL and fetch and display of content from a web page associated therewith, may be shorter than an interval of time during which a user both examines content or a portion of the content from that page, once displayed, without providing any entry (i.e. "dwells" on the page) and then fully completes or selects a next entry, e.g. by clicking on a next successive hypertext link or button.
Moreover, a conventional web browser and a user collectively and principally operate on a time-staggered basis with no overlap, i.e. they operate serially. Operating in this manner, while procedurally rather simple, is rather inefficient and wasteful of processing time. Essentially, while the computer and the browser operate in a high activity state to process user input and, by doing so, fetch a page from a web site and then display that page, the user simply waits until the browser finishes assembling that page on the monitor or attempts to browse portions of a page that have been transmitted and rendered while a download continues. Page assembly completes either when the page is fully assembled and rendered, or when the user prematurely terminates further page assembly by suitably issuing a termination signal to the browser such as a "Stop" or "Back" command. In many situations, a user is only interested in reviewing a portion of total content associated with a URL, often being most interested in representative or overview material which is typically transmitted initially. Commonly, where a user searches for information that may reside on one or several URLs, among scores of potentially valuable pages, the user only spends time reviewing a small portion of content from a URL to decide whether the content associated with that URL is relevant. While the user dwells on content that has already been completely downloaded and displayed, and pending completion of any user input, e.g. while a mouse is being manipulated or data entered through a keyboard, the computer and browser are both operating in a low activity state (the former idling or executing low priority tasks, the latter idling) pending a signal from the user, typically a mouse click, that (s)he has completed entry of that input and requests that the input be suitably processed. In cases where all of the content associated with a URL is not yet downloaded, networking resources may be wasted retrieving additional components of a page that will not provide great value to the user. Thus, processing time may be wasted by the computer and browser in either simply waiting for successive user input once a current page has been assembled and rendered, or by using computer, network, and server resources to continue to retrieve additional information associated with a URL that may be of little or no value to the user.
Hence, in a personal computer environment, one might think that if a personal computer, and specifically a web browser executing there, could utilize available processing capacity that would otherwise be wasted (either by virtue of idling or being allocated to low priority tasks) to process future page requests in some fashion, the throughput of displayed pages could advantageously increase to the benefit of the user. However, the art appears to be devoid of any teachings that dictate just how this could be accomplished.
Therefore, a need exists in the art for a technique, specifically apparatus and accompanying methods, for use with a personal computer, and particularly a web browser executing there, that can process page requests using processing and networking capacity, available during intervals of relatively low activity, such as, e.g. idle CPU or network capacity, that would otherwise be wasted, or for allocating varying amounts of networking resources away from downloading and display of components of a requested URL and in favor of downloading content associated with potential future URL requests. Advantageously, use of such a technique is likely to significantly increase the rate at which pages are typically displayed to a user, thus reducing user frustration and increasing user satisfaction.