The present invention generally relates to delivering web pages over the Internet. More particularly, the present invention relates to caching web page components to enable improved web page delivery speeds and web site scalability.
A critical issue in conducting commerce via the Internet (xe2x80x9ce-commercexe2x80x9d) is scalability. E-commerce has experienced phenomenal growth during the past few years, and is expected to continue this upward trend for years to come. Some predictions claim that e-commerce revenues will exceed $1.3 trillion by 2003. Along with this growth in revenue comes significant increases in web traffic. E-commerce web sites are having trouble supporting such extreme growth while maintaining acceptable qualities of service with the current state of Internet infrastructure technology.
As the number of Internet customers increases, e-commerce companies are required to deliver web pages to tens of thousands of customers simultaneously. This requirement can place a great strain on the computing resources of e-commerce companies. Moreover, most e-commerce companies are delivering dynamic web pages to customers, rather than static web pages. Dynamic web pages enable the delivery of tailored information to a customer, because the dynamic web page is created on-the-fly, in response to a set of parameters associated with a particular customer, such as information related to the customer""s buying habits or Internet browsing behavior.
Many web sites currently utilize application servers to dynamically generate Hypertext Markup Language (HTML) pages. Application servers execute scripts in order to generate (or create) these dynamic web pages. These scripts typically do a significant amount of work to generate a dynamic web page. For example, the application server may retrieve web page content from database systems (located locally or remotely), content transformations may be required (e.g., from XML to HTML) (XML is an acronym for Extensible Markup Language), and other business logic may be executed (e.g., personalization logic). In the absence of web page caching, each request for a dynamic web page requires that the entire script be executed each time. When the same web page content is requested and generated repeatedly, this results in unnecessary load on the application server, and hence longer (and often unacceptable) response times for site visitors.
In order to reduce the overhead associated with dynamic page generation, pages (or portions of pages) may be cached in main memory. Caching requires that policies be established to govern the replacement of pages in the cache. Commonly used cache management policies are Least Recently Used (LRU) and Most Recently Used (MRU) cache policies, both of which have been used extensively in the operating systems context. However, these schemes are based purely on access history, and are often inappropriate in the context of the web.
Today, many sites delivering dynamic web pages experience serious performance problems in terms of response times (i.e., the time to deliver a complete web page to a customer). For example, a recent study reports that the average page download time for the most popular auction sites is 6.30 seconds. Poor performance can be extremely detrimental to a web site""s ability to, successfully conduct commerce online.
Another study predicts that if a web page requires longer than eight seconds to deliver to a customer, then 30% of customers will abandon the web page request. Unacceptable web page delivery delays are a known cause for customer abandonment (i.e., customer attrition). It has been estimated that a one second improvement in page loading time (i.e., from 6.30 seconds to 5.30 seconds) can reduce the abandonment rate from 30 percent to about 7 percent. Another study indicates that customer attrition attributable to abandonment may cost the online business community upwards of $100 million per month.
To solve the attrition problem, many e-commerce providers are increasingly adopting dynamic page generation technologies to dynamically display content due to the significant flexibility it awards the designer in delivering custom content to users. However, dynamic page generation, while flexible, comes with a cost. As explained previously, web and application server (xe2x80x9cweb/app serverxe2x80x9d) scalability is significantly reduced because pages are now generated xe2x80x9con demandxe2x80x9d, which places additional load on the web/app servers in order to retrieve and format the requested content. Consequently, even under moderate traffic loads, page generation times slow down significantly.
One use of dynamic web page generation involves the promotion of related products. For example, a visitor at an online book web site may travel down the web page link path: Fiction-Thriller-Legal Thriller. If it is known (e.g., through accumulating empirical Internet browsing behavior data) that customers who travel down this linkpath are statistically likely to also be interested in fusion jazzxe2x80x94this type of knowledge is actually widely accumulated these daysxe2x80x94then the next web page presented to the customer might have a component including a reference to fusion jazz.
One solution, the eGlue Server, manufactured and marketed by Chutney Technologies, Inc. of Atlanta, Ga., is designed to quickly establish relationships between a customer""s browsing behavior and previously accumulated empirical behavior data, so that such statistical knowledge can be utilized for various purposes, such as tailoring a web site to a customer""s tastes or determining which items in a cache to replace. The eGlue Server (also called a profile server) is designed to provide such predictive knowledge to requesting applications. Given this context, the eGlue Server, immediately upon the detection of the Legal Thriller click by the visitor, would recommend that the page delivered to the user contain an e-coupon from Tower Records with a discount in the fusion jazz category. Note that while the above example is in the context of delivering advertisements and promotions, the same functionality could be used in the context of caching Web content. In particular, the predictive knowledge could be used in the cache replacement policy for Web content that is cached.
Given the extreme growth in Internet traffic as well as the increasing use of dynamic page generation technologies, there is a need in the art to improve web and application server scalability. Scalability can be improved by caching components of dynamic web pages. Further improvements in performance may be obtained by utilizing predictive information in the cache replacement policy.
The present invention solves the above-identified problems by providing a preloader that works in conjunction with a web/app server and optionally a profile server to cache web page content elements or components for faster on-demand and anticipatory dynamic web page delivery. The preloader uses a cache manager to manage requests for retrievals, insertions, and removal of web page components in a component cache. The preloader uses a cache replacement manager to manage the replacement of components in the cache. The cache replacement manager may utilize any cache replacement policy. However, a particularly effective replacement policy utilizes predictive information to make replacement decisions. Such a policy uses a profile server, which proves a means of predicting a user""s next content request. The components that can be cached are identified by tagging them within the dynamic scripts that generate them. The Preloader caches components that are likely to be accessed next, thus improving a web site""s scalability.
In one aspect of the present invention, a web page delivery system is provided for dynamically generating a web page having at least one content element. The web page delivery system has a web/app server that receives a web page request from a user, generates a web page, and delivers the web page to the user . The web page delivery system also has a preloader that receives content element retrieval requests from the web/app server and delivers content elements to the web/app server, in response to receiving the content element retrieval requests. The web page delivery system also has a profile server that receives hint requests from the preloader and delivers hints to the preloader. The preloader has a component cache and maintains the content elements in the component cache and delivers the content elements to the web server, in response to a determination that the hint indicates that the content elements will be needed by the web/app server to generate the web page.
In another aspect of the present invention, a method is provided for delivering a web page. A web page request is received where the web page request corresponds to a web page having at least one content element. It is determined whether a tagged content element resides in a component cache, where the tagged content element corresponds to the requested content element. A content response is generated for each content element request, wherein the content response includes the tagged content element if the tagged content element resides in the component cache. The requested content element is generated if the tagged content element does not reside in the component cache. A content element node is stored in the component cache, in response to a determination that the tagged content element does not reside in the component cache. The content element node corresponds to the generated content element. The web page is delivered with the respected content element(s).
In yet another aspect of the present invention, a method is provided for caching a content element. A content element retrieval request is received that corresponds to the content element. A retrieval response is sent, in response to the content element retrieval request. The retrieval response indicates whether the content element resides in a component cache. A content element insertion request is received that corresponds to the content element. An insertion response is sent, in response to the content element insertion request. The insertion response indicates whether the content element was successfully inserted into the component cache. A determination is made as to whether the content element should reside in the component cache. The content element is removed from the component cache, in response to a determination that the content element should not reside in the component cache. The content element is associated with a content element node and is stored in the component cache, in response to a determination that the content element should reside in the component cache.
In still another aspect of the invention, a component cache data structure is provided for storing a current content element. A content element node is associated with the current content element. The content element node comprises at least a NodeID data field, a NavProb data field, a NextNode data field, a Timestamp data field, and a Content data field. The NodeID data field comprises a PageID and a Code Block ID and uniquely identifies the current content element. The NavProb data field comprises a conditional probability that the current content element will be needed to generate a web page. The Timestamp data field comprises a time that the current content element was last accessed. The Content data field contains a representation of the current content element. The NextNode data field comprises an array structure containing at least one destination NodeID representing a destination content element that is reachable from the current content element.
The various aspects of the present invention may be more clearly understood and appreciated from a review of the following detailed description of the disclosed embodiments and by reference to the drawings and claims.