The invention disclosed herein relates generally to the generation of web sites that contain web pages derived from databases and templates. More particularly, the present invention relates to a system and method whereby a content management system can quickly determine which web pages have become stale as a result of changes to the content contained in the underlying database. Once the system makes the determination, it re-creates the stale web pages with new content. The present system and method allow a business with a web presence to save time and resources by having the content management system automatically detect stale web pages. Only then can the system selectively re-create those web pages with newer content, instead of also re-creating pages that are unaffected by changes to content in the database.
Database driven content management systems are typically used to automate information management for large-scale, high-volume online operations. Such systems are capable of generating every page in a web site dynamically, e.g., at run time when the user requests the page. Dynamic pages are designed to display time-variant, user-dependent information. Examples of dynamic web pages are personalized mailboxes, customized order forms, and a web page designed to present a user with his or her favorite news topics. Since the information and layout of these types of pages changes depending on the time they are accessed, or who they are accessed by, they must be generated at run time when the user requests the page. The content displayed by the dynamic page is stored in a database or other data management system.
A runtime engine interprets the instructions contained within the dynamic page. Web application servers such as WebLogic and WebSphere provide this runtime engine. The instructions and variables that make up the dynamic page may be written in a variety of programming languages or scripting languages or both. Exemplary scripting languages include JavaServer Pages(JSP), JavaScipt, VBScript, JScript, ASP (Active Server Pages), Python, and Perl. A piece of program logic written in these languages is broadly known as a ‘template.’ The runtime engine interprets/executes these coded instructions at runtime upon request, resulting in a stream of data. Requests are typically generated from a browser (Internet Explorer, Netscape Navigator or such) and the result of execution, in the form of stream of data is transmitted back to the browser that requested it. The stream of data is generated in a format that is understood by Web browsers, e.g., HTML (Hypertext Markup Language) or XML (eXtensible Markup Language).
Dynamic generation, however, consumes computational resources and time. Additionally, dynamic generation is not required for web pages displaying content that does not change with users or time, for example, a research publication. When dynamic generation is not required, web masters are better served by static pages. A static page is a physical file that is stored in a file system, typically with HTML, SHTML or HTM extensions. Upon request, such a static page is rendered to the requesting browser. Rendering static pages is facilitated by the use of a web server such as Apache Web Server, IBM Web Server, or Internet Information Server (IIS). A web server is software capable of reading such files on the file system and transmitting them across the network to the requesting browser or an equivalent program. Most web servers employ sophisticated mechanisms such as page caching that provide higher access times for static pages. Because rendering static pages does not involve computation of the page itself, static pages can be delivered at higher performance when compared to dynamic delivery.
The overhead involved in the use of static pages arises when their underlying data is modified. When the underlying data is modified, these static pages become stale and must be regenerated to reflect the modifications. In a large website consisting of thousands of static pages, identifying which of the static pages must be regenerated becomes critical. Without this identification, the system would be often forced to regenerate even those pages whose underlying data did not change. This sometimes offsets the advantage of the higher performance of static pages over dynamic pages. The fact is that the typical content management system is unable to efficiently handle the republication of static pages in volume; specifically, when content in the database is changed regularly, these changes could affect tens of thousands of static pages. Because the content management system cannot determine which static pages are affected by the changes, it either indiscriminately re-creates current pages along with the stale pages, or it requires a human operator to manually determine and specify which static pages to re-create. Recreating static pages by manually determining which pages are stale can require hundreds of man-hours.
There is thus a need for a system that can automatically and quickly determine which static web pages are stale. There is also a need for the system to selectively republish only the stale pages using current content, instead of republishing the entire site. There is also a need for the system to optimize the republication process. Such as system makes it possible for businesses to benefit from the high performance that they expect when they choose static pages over dynamic pages.