1. Field of the Invention
The present invention relates generally to a method for managing client and server access to electronically stored document repositories, and more particularly, to a method for determining which documents to prefetch and cache to improve document retrieval efficiency.
2. Description of Related Art
The World Wide Web (hereinafter the web) is an architectural framework for accessing documents (a term used interchangeably herein with web pages) stored on a worldwide network of distributed servers called the Internet. Documents stored on the internet are defined as web pages. The architectural framework of the web integrates web pages stored on the Internet using links. Web pages may consist of elements that include, but are not limited to, text, graphics, images, video, or audio. A web page which points to the location of another web page is said to be linked to that other web page. Links that are set forth in a web page usually take the form of a text fragment or an image. A user follows a link by selecting it.
In order to speed up access to a document that is selected by a user following a link, client computers can prefetch and cache documents. The number of documents that can be prefetched and cached depends on the amount of available cache on a client or server computer. Generally, the available resources on a client or server computer is small compared with the number of documents available on the web. In other words, only a small fraction of the expansive number of documents available on the web can be cached locally on a client or server computer. Consequently, the better a client or a server computer is able to identify documents that are most likely to be needed by a user, the better performance that user will experience while following linked documents or simply retrieving document from the web.
When using caching, the client computer initially examines whether the requested document is in local cache. If the document does exist in local cache and it is current (where current means that a newer version of the document does not exist), then the document is immediately delivered to the user. Otherwise, if the document is not in cache, the client computer fetches the document from a server located somewhere on the web. Depending on the document size and the available transmission rate, delivery of the document to the user could take a significant amount of time. The best way to optimize caching on a client computer is to define a set of documents that best predicts which documents are to be accessed by a user in the future. Those documents predicted to be in the set are stored in cache. Different methods for predicting which documents on the web best define the set of document that should be cached are known.
It is also known that the analysis of history (or past use) can be used to predict future use. Anderson et al., in "Reflections of the Environment in Memory," Psychological Science, 2, pp. 396-408, 1991, observed that specific mathematical laws can be used to predict future information needs from past events. Past events include news headlines, child language, and e-mail sources. In addition, Schooler et al., in "The Role of Process in the Rational Analysis of Memory," Cognitive Psychology, 32, pp. 219-250, 1997, found that these specific mathematical laws can be used to predict the result of controlled experiments on human memory. Furthermore, it has been found that these specific mathematical laws hold, to a good approximation, in predicting library circulation (see Burrell, "A simple Stochastic Model for Library Loans," Journal of Documentation, 36, p. 115-132, 1980) and in predicting web use (see Pitkow in "Characterizing World Wide Web Ecologies," Tech. Rep. UIR-R97-02, Palo Alto, Calif., 1997).
Caching on a client computer is beneficial because many of such systems are portable and are operated when they are disconnected from the network. When a laptop, or the like, is disconnected from the network, the user of the client computer is unable to retrieve documents that are stored on remote servers on the network. If the need for those documents was properly anticipated by the client computer before being disconnected from the network, the user of the client computer would be able to continue working as though having never been disconnected from the network. A related problem on client computers that are disconnected from the network is relegation. Because client computers often have limited memory (i.e., hard drives, etc.), the space required to stored documents anticipated to be used by the client computer must be created. Additional space is created by relegating (i.e., uploading) some of the files on the client computer to a less constrained storage device on the network.
Also known are computer programs that make browsing on the web more efficient. For example, the Web Wacker 2.0 by Blue Squirrel (found on the web at http://www.bluesquirrel.com/whacker/) is a utility that allows users to identify URLs to download (i.e., prefetch and cache) from the web onto client computers, and to specify the scheduling of those downloads (e.g., daily, weekly, etc.). Identification of the URLs to download can be performed at any time (e.g., while browsing other documents). Furthermore, the Web Wacker allows users to specify that downloads onto a client computer include URLs located within some specified depth of web links from the specified URLs. In addition, the Web Wacker can also be used on laptop computers for automatically downloading selected documents for later use before disconnecting from the network.
Systems such as the Web Wacker, however, require users to specifically identify which documents to be prefetched and cached or downloaded for later use. It would, therefore, be advantageous to provide a method for automatically predicting which documents are most likely to be needed by a user of a client computer. Using the predictions, documents can be prefetched (i.e., downloaded) from the network or relegated (i.e., uploaded) to the network, thereby efficiently managing a computers available memory resources.