The Internet is a vast computer network consisting of many smaller networks spanning the globe. It is well known "lore" that the Internet was started in the late 1960's as development project of the U.S. Department of Defense to provide a back-up communications system that would be virtually impossible to destroy in the event of a major catastrophe. The Internet has grown exponentially, and millions of private users and corporations now use it daily for all kinds of communications needs.
The World Wide Web (WWW) was developed in 1991 as a information system running over the Internet. The WWW is based on the concept of "hypertext" and a transfer method known as HTTP (Hypertext Transfer Protocol). HTTP is designed to run primarily over TCP/IT (Transmission Control Protocol/Internet Protocol), a networking protocol that permits use of the Internet. One increasing use of the WWW is commercial--with recent improvements in secure transactions as well as graphical presentation, merchants can display and sell their goods and services over the Internet.
One format for information transfer over the WWW is to create documents using Hypertext Markup Language (HTML), a programming language that supports navigational linking ("hypertext links"). HTML is a structured language, based on SGML (Standard Generalized Markup Language), a document processing system. Like SGML, HTML describes the structure of the document through a system of tags; HTML pages are made up of standard text as well as formatting codes for headings, paragraphs, lists, tables and character styles, that indicate how the page should be displayed. HTML includes a tag called a "link tag" that provides the programming for nonlinear navigational links. One example of the use of HTML pages with navigational links in the context of business documents is described in U.S. Pat. No. 5,692,073 to Xerox Corporation for "Formless Forms and Paper Web Using a Reference-Based Mark Extracting Technique".
The WWW makes use of Uniform Resource Locator (URL) to define the address of a particular page on the Internet. The URL naming system consists of three parts: the transfer format (often "http") followed by a colon and two forward slashes (://), the name of the host machine that holds the file, and finally, the path to the file on the host machine. In a typical piece of hypertext, the data stored in the hypertext link is a label pointing to a remote destination. This is programmed in HTML by embedding the address of the link destination, the URL, in the link tag.
When a client accesses a web page, it does so through a software program called a browser which establishes the connection with the server hosting the page. The server executes corresponding server software which presents information to the client in a transfer format (eg., http) response corresponding with the web page or other data generated by the server. As the web page is initialized on the client machine, the browser renders the text and graphics for it from the HTML data.
While HTML is used to deliver data on the web, most of the underlying information is not stored in HTML, but in other, richer storage formats, such as SGML and legacy systems such as databases. The data in these other formats must often be converted to HTML dynamically. Methods for converting files from SGML to HTML, including adding "anchors" or navigational links referencing other files during the conversion, are discussed in U.S. Pat. No. 5,530,852 of Sun Microsystems, Inc., titled "Method for Extracting Profiles and Topics from a First File Written in a First Markup Language and Generating Files in Different Markup Languages Containing the Profiles and Topics for use in Accessing Data and Described by the Profiles and Topics", and in "HTML makes a great delivery vehicle for Web-based information. It just isn't a sensible place for much of that information to live in." by R. Light, Archives and Museums Informatics, vol. 9, no. 4, pp. 381-387, 1995.
In a commercial web site, a store sells its products to potentially millions of customers on the Internet by displaying the products through HTML documents. It is common that a merchant may have thousands of products in its catalog to sell. It is tedious, error prone and nearly impossible to manually create and manage the static HTML documents for navigating to and displaying these large number of products.
A merchant server system helps merchant manage the catalog data and provides the support to sell products on the merchant's web site. In a merchant server system, the merchants catalog data are commonly stored in a relational database. There are database tables for storing product information, tables for grouping related products together into category and related categories together into higher level category, and tables for storing category information. When a shopper goes to the merchant's web site from his browser, the merchant server accesses the data in the database through a structured query (SQL) and dynamically generates HTML documents to show the category and product pages as the shopper navigates through the merchant's store. For example, U.S. Pat. No. 5,692,181 of NCR Corporation for "System and Method for Generating Reports from a Computer Database" discusses the problems associated with organizing interrelated data in database tables, and generating customized HTML documents, in this case, reports, from data stored in relational databases.
In an electronic retail situation, a shopper usually enters the web site for a department store, for example, at the store's home page. From the home page, the shopper can click on a link to visit a top level category such as the Men's Wear department. From the Men's Wear page, he can choose the Pant section among other links to second level categories on the page. As the shopper navigates down the category hierarchy, he reaches a product page that shows a dress pant of a certain brand and the available sizes and colors. He can now pick the size and color he wants, and order the pant. The merchant server will take him through the ordering pages where he can provide the payment and shipping information. When the ordering steps are done, the order information will be recorded in the database and the merchant will be able to use this information to fulfill the order later.
While dynamically generating the category and product pages are desirable so that the merchant only needs to manage the catalog information in the database, it takes up processing cycles in the merchant server to access the database and dynamically create the HTML pages the shopper wants to see. If the web site receives heavy traffic, this can significantly slow the shopping experience. A category or product page is the same one whether it is generated the first time or subsequent times until the corresponding catalog data in the database is changed.
It can significantly reduce the load on the merchant server and improve the system performance if the generated pages can be saved for subsequent access and are re-generated only when the corresponding catalog data is changed. The shoppers will see a much better response time in navigating through the category and product pages because the pages are readily displayable from the web site once they have been "cached".
However, one problem for the merchant server is being able to maintain the validity of the cached pages automatically so that the caching function becomes completely transparent to the merchant, who will manage the catalog data as usual. That is, when the data in the database used for cached pages is changed, it would be preferable if the merchant server was able to purge invalid cache pages automatically and re-generate new ones as they are needed.