The Internet, also referred to as an “internetwork”, is a set of computer networks, possibly dissimilar, joined together by means of gateways that handle data transfer and the conversion of messages from the sending network to the protocols used by the receiving network (with packets if necessary). When capitalized, the term “Internet” refers to the collection of networks and gateways that use the TCP/IP suite of protocols.
The Internet has become a cultural fixture as a source of both information and entertainment. Many businesses are creating Internet sites as an integral part of their marketing efforts, informing consumers of the products or services offered by the business or providing other information seeking to engender brand loyalty. Many federal, state, and local government agencies are also employing Internet sites for informational purposes, particularly agencies which must interact with virtually all segments of society such as the Internal Revenue Service and secretaries of state. Providing informational guides and/or searchable databases of online public records may reduce operating costs. Further, the Internet is becoming increasingly popular as a medium for commercial transactions.
Currently, the most commonly employed method of transferring data over the Internet is to employ the World Wide Web environment, also called simply “the Web”. Other Internet resources exist for transferring information, such as File Transfer Protocol (FTP) and Gopher, but have not achieved the popularity of the Web. In the Web environment, servers and clients effect data transaction using the Hypertext Transfer Protocol (HTTP), a known protocol for handling the transfer of various data files (e.g., text, still graphic images, audio, motion video, etc.). The information in various data files is formatted for presentation to a user by a standard page description language, the Hypertext Markup Language (HTML). In addition to basic presentation formatting, HTML allows developers to specify “links” to other Web resources identified by a Uniform Resource Locator (URL). A URL is a special syntax identifier defining a communications path to specific information. Each logical block of information accessible to a client, called a “page” or a “Web page”, is identified by a URL. The URL provides a universal, consistent method for finding and accessing this information, not necessarily for the user, but mostly for the user's Web “browser”. A URL includes a Uniform Resource Identifier (URI). The URI is the portion of the URL which more specifically identifies a particular page to be displayed.
A browser is a program capable of submitting a request for information identified by a URL at the client machine. Retrieval of information on the Web is generally accomplished with an HTML-compatible browser.
Web content is often dynamic. In the modern Internet, personalization of content to specific users and groups necessitates dynamic content, as does changing content due to user actions (e.g. shopping carts change, though your request for that cart does not). Even static pages are occasionally updated. Web servers provide static content and dynamic content to various users. Static content contain data from files stored at a server. Dynamic content is constructed by programs, including such technologies as servlets, ASPs, and CGI, executing at the time a request is made. The presence of dynamic content often slows down Web sites considerably. High-performance Web servers can typically deliver several hundred static pages per second. By contrast, the rate at which dynamic pages are delivered is often one or two order of magnitudes slower.
Dynamic content is often present at a Web site in an effort to provide customized pages and updated information to various users that may visit the site. The use of this type of Web page, however, may cause a Web site to slow down in performance.
In the generic web application environment, dynamic content is generated (e.g. by executing a servlet) for every request. A dynamic web cache allows a dynamically generated page to be cached and later served in response to future requests without regenerating its output (without executing that servlet again). The first time a request is made for dynamic content, the application executes the appropriate servlets necessary to display the page. The output of these servlets is typically HTML code which is then presented to the user. Other types output include XML and images such as GIFs and JPGs. When a user requests a page for the first time, the servlets execute and the code is stored as a cache entry. Each subsequent time the user requests this page, this cache entry is retrieved and presented to the user. When the page is to be refreshed, the application executes all of the servlets again to create a new cache entry.
The method described above for caching dynamic content can be applied to entire pages, requested externally by users. This method is inflexible, and often inefficient, as whole pages are generally constructed from several dynamic fragments. Frequently, the content of only parts of a page may change. In these cases, valuable computing resources are wasted by regenerating those parts of the page which were not changed.
Some applications include a caching capability within the application itself. Each application may choose to implement caching in a unique manner. The form of the servlets will vary from one application to the next. In these systems, each servlet must know how to generate its own cache entry. Therefore, in order to change the way the dynamic content is cached, each servlet, in each application, must be changed. Further, with this approach, each application must provide for its own caching, which cannot be applied to other applications. Existing applications which do not currently provide for caching must be updated in order to permit caching.
In these systems, some servlets are cacheable, and some are not. When a servlet is cacheable, the servlet includes the information necessary to generate its cache entry.
Therefore, a need exists for a data processing system and method for specifying a caching policy for caching dynamic content including portions of pages and supporting both internal and external requests, where caching is executed separately from applications.