1. Technical Field
The present invention relates generally to an improved data processing system and in particular to a method and apparatus for caching data in a data processing system. More specifically, the present invention relates to a method and apparatus for caching documents containing dynamic content.
2. Description of Related Art
The Internet, also referred to as an “internetwork”, is a set of computer networks, possibly dissimilar, joined together by means of gateways that handle data transfer and the conversion of messages from the sending network to the protocols used by the receiving network (with packets if necessary). When capitalized, the term “Internet” refers to the collection of networks and gateways that use the TCP/IP suite of protocols.
The Internet has become a cultural fixture as a source of both information and entertainment. Many businesses are creating Internet sites as an integral part of their marketing efforts, informing consumers of the products or services offered by the business or providing other information seeking to engender brand loyalty. Many federal, state, and local government agencies are also employing Internet sites for informational purposes, particularly agencies which must interact with virtually all segments of society such as the Internal Revenue Service and secretaries of state. Providing informational guides and/or searchable databases of online public records may reduce operating costs. Further, the Internet is becoming increasingly popular as a medium for commercial transactions.
Currently, the most commonly employed method of transferring data over the Internet is to employ the World Wide Web environment, also called simply “the Web”. Other Internet resources exist for transferring information, such as File Transfer Protocol (FTP) and Gopher, but have not achieved the popularity of the Web. In the Web environment, servers and clients effect data transaction using the Hypertext Transfer Protocol (HTTP), a known protocol for handling the transfer of various data files (e.g., text, still graphic images, audio, motion video, etc.). The information in various data files is formatted for presentation to a user by a standard page description language, the Hypertext Markup Language (HTML). In addition to basic presentation formatting, HTML allows developers to specify “links” to other Web resources identified by a Uniform Resource Locator (URL). A URL is a special syntax identifier defining a communications path to specific information. Each logical block of information accessible to a client, called a “page” or a “Web page”, is identified by a URL. The URL provides a universal, consistent method for finding and accessing this information, not necessarily for the user, but mostly for the user's Web “browser”. A browser is a program capable of submitting a request for information identified by a URL at the client machine. Retrieval of information on the Web is generally accomplished with an HTML-compatible browser.
Web content is often dynamic because of various changes made by developers and other users publishing or making available web content, such as Web pages. Even static pages are occasionally updated. Web servers provide static content and dynamic content to various users. Static content contain data from files stored at a server. Dynamic content is constructed by programs executing at the time a request is made. The presence of dynamic content often slows down Web sites considerably. High-performance Web servers can typically deliver several hundred static pages per second. By contrast, the rate at which dynamic pages are delivered is often one or two order of magnitudes slower.
Dynamic content is often present at a web site in an effort to provide customized pages and updated information to various users that may visit the site. The use of this type of Web page, however, may cause a web site to slow down in performance.
Proxy caches are used to store data at sites that are remote from the server which originally provided the data. Proxy caches reduce network traffic and latency for obtaining Web data because clients can obtain the data from a local proxy cache instead of having to request the data directly from the site providing the data. This mechanism, however, does not work well with dynamic pages. One problem presented by dynamic pages cached in proxy servers is that it is essential for the cache pages to be current at all times.
Therefore, it would be advantageous to have an improved or alternative mechanism for caching and handling dynamic content.