1. Technical Field
The present invention relates in general to accessing Web pages and in particular to accessing Web pages in environments including multiple users. Still more particularly, the present invention relates to shared caching of Web pages among multiple users with minimal modifications to existing browsers.
2. Description of the Related Art
The Internet provides a valuable source of both entertainment and information to all segments of society. In addition to commercial enterprises utilizing the Internet as an integral part of their marketing efforts in promoting their products or services, many federal, state, and local government agencies are also employing Internet sites for informational purposes, particularly agencies which must interact with virtually all segments of society such as the Internal Revenue Service and secretaries of state. Operating costs for both commercial enterprises and governmental agencies may be reduced by providing informational guides and/or searchable databases online.
Currently, the most commonly employed method of accessing and distributing data over the Internet is to employ the World Wide Web (WWW) environment, also called simply "the Web." Other Internet resources exist for transferring information, such as File Transfer Protocol (FTP) and Gopher, but have not achieved the popularity of the Web. In the Web environment, servers and clients effect data transaction using the Hypertext Transfer Protocol (HTTP), a known protocol for handling transfer of various data files (e.g., text, still graphic images, audio, motion video, etc.). Information is formatted for transfer and presentation to a user by a standard page description language, the Hypertext Markup Language (HTML).
In addition to basic formatting, HTML allows developers to specify "links" to other Web resources, identified by a Uniform Resource Locator (URL). An URL is a special syntax identifier defining a communications path to an Internet server containing specific logical blocks of information, colloquially called a "page," accessible to an Internet client. Web pages may be of arbitrary size and include text, graphics, forms for submitting queries to databases on the remote server, and other components. A "page" includes all files required to present the information requested utilizing the identifying URL, including text/HTML files, graphics files, sound files, etc.
Retrieval of information on the Web is generally accomplished with an HTML-compatible "browser"--an application program capable of submitting a request for information identified by an URL--at the client machine. The request is submitted to a server connected to the client system and may be handled by a series of servers to effect retrieval of the requested information. The information is provided to the client formatted according to HTML.
When Web pages are retrieved under direct user control, it is common practice for contemporary Web browsers to cache pages accessed by the user. Network bandwidth is finite, and the time required to retrieve a Web page depends in part on the number of servers at the site from which the Web page is being retrieved. Furthermore, Web pages often include sizable graphics files or other large files requiring a substantial amount of time to transfer from the source to the requesting client. Caching Web pages allows a user to repeatedly view the information within a short span of time without retrieving the Web pages each time.
Large traffic demands to specific Web sites can make access to such sites difficult. The amount of time which a user must wait to view a Web page during peak utilization periods can be very long. To ease the difficulty of accessing sites with high traffic demands, Web browsers may retrieve frequently accessed Web pages by off-line browsing. Off-line browsing allows information at the site to be retrieved during off-peak periods without contemporaneous user interaction at the client. The pages are typically retrieved from the originating Internet Web site by off-peak retrieval, or retrieval during periods when traffic to the site is at a minimum. The retrieved pages are cached in a local memory, such as a hard drive, for subsequent off-line viewing by the user without connection to the Web site from which those pages originate.
Caching of Web pages is also performed at proxies. Typically, a local area network is segregated from external networks or systems by a firewall, a barrier designed to stop all data flow in either direction. Proxies, which are installed in addition to or as part of the firewall, handle data transfers between the local network and external sources, including Internet Web sites. Thus, caching in proxies, which serve an entire intranet, can benefit the entire local network. However, caching at a single machine, particularly at a machine that is serving a different purpose, is not scalable.
Another problem relates to the fees charged by service providers for maintaining a Web site (i.e. a Web page and supporting files) for companies or individuals. Often this fee may be very expensive, depending on the services provided or the level of competition among service providers. With respect to the services offered, for example, the service provider may allocate disk space for the individual Web pages and supporting files, provide support personnel for maintaining the Web server, etc. Thus, the fee charged to companies and individuals for maintaining a Web site may become very expensive over time. Furthermore, maintaining a Web site on a server located at premises controlled by the company or individual includes not only front-end costs of purchasing and connecting the server, but also a high monthly fee for maintaining an uninterrupted connection to the Internet.
It would be desirable, therefore, to allow scaling of an available cache by the number of browsers in an intranet or local area network. It would further be advantageous if the mechanism allowing such scaling could be implemented with as little change as possible to existing browsers. It would further be advantageous if the mechanism provided could be employed to support inexpensive maintenance of Web pages for companies and/or individuals.