1. Field of the Invention
The invention relates to processing of electronic documents. More particularly the invention relates to a method of optimizing generation of web pages having dynamic content.
2. Description of Related Technology
Today's Internet websites must deliver an ever-increasing amount of dynamic web page content. Dynamic web page generation is the process whereby a server computer creates HTML “on the fly” to send to a client computer (a Web browser). Dynamic web pages differ from static web pages in that the content of a dynamic web page can only be determined the moment a page request is received by the server computer. While a static web page might display a biography of Abraham Lincoln, content which can be created once and not changed anymore, such a web page and methodology would not be suitable for a web page which displayed the current price of oranges at five local supermarkets. The latter case requires that the server computer utilize dynamic information and compose that information into a web page to send to a client computer.
A common practice employed to aid in the creation of dynamic web pages is the use of HTML containing “tokens”, or “tokenized HTML”. A tokenized HTML file contains some never-changing static information, for example a page heading with the word “Welcome” in it, but also contains some dynamic or “live” areas; for example, an area after “Welcome” where the user's name is to be dynamically placed. This will allow each user to see a Web page that is customized for them. When Sally visits this web page she'll be greeted with a page title that says “Welcome Sally”, and when Joe visits this web page it will be titled, “Welcome Joe”. One of the major advantages of using tokens as placeholders for dynamic content is that they are extremely unobtrusive, allowing technical personnel such as programmers to make sure that dynamic content is placed in certain areas of the page without the necessity of embedding complicated source code in the HTML, which may be very confusing and distracting to someone such as a graphic designer, who is tasked with maximizing the page's aesthetic appeal.
To serve up dynamic web pages, a web server typically creates a dynamic page by loading up a static HTML page with a “token” or “placeholder” in the area where the user's name went. The tokens are of a known form; for example, “@UserName@,” so that they may be searched for quickly and uniquely. The server searches the page looking for the tokens that refer to dynamic content, e.g. “@UserName@.” Once the token has been located, the server replaces its text with the dynamically discovered text, e.g “Sally.” Replacing a token involves storing all of the text leading up to the token and concatenating it with the dynamic content and all of the text following the token. It must do this for each request it receives (each dynamic page that each user asks for).
Various methods of creating documents with varying content have been proposed. For example, J. Cooper, M. San Soucie, Method of generating document using tables storing pointers and indexes, U.S. Pat. No. 4,996,662 (Feb. 26, 1991) describe a document processing system having a system architecture that includes a control structure providing supervisory routines for controlling supervisory functions of the system and document manipulation routines for operating upon the documents.
R. Smith, D. Ting, J. Boer, M. Mendelssohn, Document management and production system, U.S. Pat. No. 5,181,162 (Jan. 19, 1993) disclose an object-oriented document management and production system in which documents are represented as collections of logical components that may be combined and physically mapped onto a page-by-page layout.
D. Dodge, S. Follett, A. Grecco, J. Tillman, Method and apparatus for document production using common document database, U.S. Pat. No. 5,655,130 (Aug. 5, 1997) describe a system and method for producing a variety of documents from a common document database. In the described system, source documents are decomposed into encapsulated data elements, in which a data element includes the actual content along with classifying data about the content. The encapsulated data elements are saved to a database, and can be later reassembled to form variation specific documents.
All of the systems described above involve the decomposition of source documents into smaller components, storing the document components in a database and reassembling the document components to form different variations of the source document, or completely new documents. While these systems facilitate the building of variation specific documents such as software documentation, and other engineering documents, they only involve combining and recombining static elements in various ways. The disclosed systems don't provide any way of generating a document “on the fly” that incorporates dynamically discovered information. Furthermore, none of the systems described concern themselves with optimizing the process of incorporating dynamic information into an online document by reducing the required computer resource usage.
Various other methods have been proposed for creating dynamic content in pages for delivery to a client over the Internet on the World-Wide Web (WWW). For example, JAVA SERVER PAGES from Sun Microsystems, Inc. of Menlo Park Calif. or ACTIVE SERVER PAGES from Microsoft Corporation of Redmond Wash. create all of the page content by having the page's Java or C++ server code write all of the page content to the client browser (the output stream). The major drawback of these solutions is that the server code and the page design (the HTML) are both contained in the same HTML file making it extremely difficult for non-programmers (e.g. graphic artists) to use popular page design tools to modify the content on these pages
The primary task of Internet Web server computers is to deliver content (Web pages) to client computers (Web browsers). These server computers are expected to perform these operations extremely rapidly because they are being besieged by, potentially, thousands and thousands of client requests per second. For this reason web developers attempt to reduce bottlenecks in the server software so that the server is performing up to its maximum capacity. The problem, then, arrives when many tokens in many dynamic pages need to be repeatedly replaced with dynamic content. Though the example in the preceding paragraph only contained a single token, the reality is that dynamic Web pages are normally far more complex than in this example, and might have 20 or more tokens.
Without any optimization, on each request, the server would have to re-read the base HTML file from disk, search and replace all occurrences of each token, and then write the newly created content stream to the client's return stream. The problem with this approach is that it is extremely time consuming. Even if the file is read from disk only once (i.e. it's cached) the act of replacing all occurrences of all tokens in the file is a very slow and very costly operation. It is so slow that it would likely be the primary bottleneck in the server software. Additionally, buying more hardware, bandwidth, etc. will not solve this problem because no matter how many machines were running concurrently, on each client request each web server would have to re-read and re-replace the page's content.
There exists, therefore, a need in the art for a way to reduce the processing overhead required to parse the HTML code of a web page that requires the incorporation of dynamic content in order to locate areas, identified by tokens, wherein the dynamic contented is to be inserted, and replacing the tokens with the dynamic content.