1. Technical Field
The present invention relates in general to accessing Web pages and in particular to accessing Web pages in environments including a server shared by multiple users. Still more particularly, the present invention relates to remote caching of Web pages on a file server utilized by multiple users.
2. Description of the Related Art
The Internet provides a valuable source of both entertainment and information to all segments of society. In addition to commercial enterprises utilizing the Internet as an integral part of their marketing efforts in promoting their products or services, many federal, state, and local government agencies are also employing Internet sites for informational purposes, particularly agencies which must interact with virtually all segments of society such as the Internal Revenue Service and secretaries of state. Operating costs for both commercial enterprises and governmental agencies may be reduced by providing informational guides and/or searchable databases online.
Currently, the most commonly employed method of accessing and distributing data over the Internet is to employ the World Wide Web (WWW) environment, also called simply "the Web." Other Internet resources exist for transferring information, such as File Transfer Protocol (FTP) and Gopher, but have not achieved the popularity of the Web. In the Web environment, servers and clients effect data transaction using the Hypertext Transfer Protocol (HTTP), a known protocol for handling transfer of various data files (e.g., text, still graphic images, audio, motion video, etc.). Information is formatted for transfer and presentation to a user by a standard page description language, the Hypertext Markup Language (HTML).
In addition to basic formatting, HTML allows developers to specify "links" to other Web resources, identified by a Uniform Resource Locator (URL). A URL is a special syntax identifier defining a communications path to an Internet server containing specific logical blocks of information, colloquially called a "page," accessible to an Internet client. Web pages may be of arbitrary size and include text, graphics, forms for submitting queries to databases on the remote server, and other components. A "page" includes all files required to present the information requested utilizing the identifying URL, including text/HTML files, graphics files, sound files, etc.
Retrieval of information on the Web is generally accomplished with an HTML-compatible "browser"--an application program capable of submitting a request for information identified by a URL--at the client machine. The request is submitted to a server connected to the client and may be handled by a series of servers to effect retrieval of the requested information. The information is provided to the client formatted according to HTML.
When Web pages are retrieved under direct user control, it is common practice for contemporary Web browsers to cache pages accessed by the user. Network bandwidth is finite, and the time required to retrieve a Web page depends in part on the number of servers at the site from which the Web page is being retrieved. Furthermore, Web pages often include sizable graphics files or other large files requiring a substantial amount of time to transfer from the source to the requesting client. Caching Web pages allows a user to repeatedly view the information within a short span of time without retrieving the Web pages each time.
Large traffic demands to specific Web sites can make access to such sites difficult. To ease the difficulty of accessing sites with high traffic demands, Web browsers retrieve frequently accessed Web pages by off-line browsing. Off-line browsing allows information at the site to be retrieved during off-peak periods without contemporaneous user interaction at the client. The pages are typically retrieved from the originating Internet Web site by off-peak retrieval, or retrieval during periods when traffic to the site is at a minimum. The retrieved pages are cached in a local memory, such as a hard drive, for subsequent off-line viewing by the user without connection to the Web site from which those pages originate.
Where several users in an enterprise access and cache the same Web page or pages, it is inefficient for each user to caches these pages locally. Caching is also currently performed at proxies, but caching at proxies is not scalable as such and may not provide benefits for multiple users which browse the same or similar pages due to a lack of ability to share the cache. It would be desirable, therefore, to improve the storage of frequently accessed Web pages to improve the performance of an Intranet, the Internet, and Internet service providers.