1. Field of the Invention
The present invention relates to the field of caching and, more particularly, to ESI-based edge server caching.
2. Description of the Related Art
As the Internet continues to evolve and as the World Wide Web (“the Web”) becomes more congested, significant attention is being given to reducing the load on the Web and increasing the efficiency of Web operations. One area that has been the focus of extensive research and development activity is the field of caching. Caching is simply the local or auxiliary storage of previously viewed content so that a request to view this same content can be served to a client browser without having to re-request the same content from the application server that served it in the first instance (referred to herein as the “originating server” or “origin server”). Caching, among other things, saves the origin server the trouble of having to recreate the content multiple times.
FIG. 1 illustrates a typical prior art web architecture 100. Multiple user terminals 102, 104, and 106 operate browsing software over the internet or other network connection 108 to retrieve content (e.g., web pages) from a content source 110. Typically content source 110 comprises one or more application servers 112 coupled to one or more databases (not shown). In this type of system, all of the content is generated and delivered from content source 110, requiring expensive infrastructure and placing heavy operational loads on the application servers 112.
The “origin server” for a particular block of content is the application server 112 that serves the original request for that block of content. Since the origin server is the source of the content, it is considered to be at the center of the network, and the clients (i.e., user terminals 102-106) are considered to be located at the outer edge of the network. Thus, the closer a network device is to the client, the closer it is to the “edge of the network.”
FIG. 2 illustrates a known improvement to the typical web architecture shown in FIG. 1. In FIG. 2, client servers 202, 204, and 206 still access the content source 210 via the internet/network 208. However, multiple auxiliary servers such as edge servers 220 are located closer to the outer edge of the network and act as intermediary servers that operate between the client servers and the content source. These “edge servers” were initially used to, among other things, cache web pages so that, if a request was made for a page that was already stored in one of the edge servers 220, the content could be immediately served back to the requesting client browser rather than having to traverse all the way to the content source 210 and tax the operational resources such as application servers 212 and databases 214.
“Whole-page caching” is a rudimentary form of caching in which an entire “page” of web content is cached upon retrieval from the originating server. Whole-page caching is effective for static web pages where the entire page is likely to remain unchanged for extended periods of time. Dynamic web pages, however, and their ability to create web content “on-the-fly” and/or customized web content depending upon the identity of the user, introduce significant caching challenges.
Edge Side Includes (ESI) is an emerging World Wide Web Consortium (“W3C”) standard that defines an XML-based markup language used to assemble markup language fragments for clients, such as HTTP clients. A thorough explanation of ESI can be found in the W3C “ESI Language Specification 1.0” (August 2001) at http://www.w3.org/TR/esi-lang. ESI permits a web page to be partitioned into fragments and dynamically assembled at an edge server, thus allowing several performance and space optimizations, previously impossible, to be realized. As an example, many web sites provide an identical “sidebar” of content on each page of the website and different content in a main portion of each page. When whole-page caching is utilized, multiple copies of the same sidebar will exist in different cache entries, thus wasting cache space. When ESI is used to delineate the sidebar as a fragment, only one version of the sidebar need exist in the cache and this single sidebar can be incorporated into pages to create the complete web pages, using ESI's fragment assembly capabilities, at the edge server.
Another compelling use for ESI fragments is when an otherwise cacheable web page contains a small portion (or portions) that is either personalized for a particular user or class of users, or highly dynamic (e.g., weather maps). Even though the majority of the content on the page may be static content and thus be an excellent candidate for caching using whole page caching techniques, caching such pages as whole pages would not result in any advantage, since the cached pages would “expire” quickly. With ESI, the personalized portion of the page (or the highly dynamic portion, e.g., the weather map) is identified as an ESI fragment and the remaining portion of the page is now cacheable as a “template,” also called a top-level fragment. The ESI runtime processor utilizes its fragment-assembly capability to generate the complete page at the edge server as a concatenation of a template and fragments cacheable at the edge and fragments that need to be fetched from the originating server due to their personalized or dynamic nature. This reduces the amount of page processing that occurs on the originating server to only those portions that need to be executed there (i.e., the personalized/dynamic content), and thus reduces the processing overhead of the originating server.
FIGS. 3-6 illustrate a simple example of a situation where ESI can be used to advantage. In this example, content is personalized for particular groups of users (e.g., new customers designated with “silver” status; low-volume repeat customers designated with “gold” status; and high-volume repeat customers designated with “platinum” status). FIG. 3 illustrates a fully assembled web page 300 having a static “sidebar content” field 302 and a “variable content” field 304 which, in FIG. 3, shows “silver content” fragment 306 inserted in field 304. It is understood that the sidebar content could also be a fragment and that the template could be simply a “blank” page into which multiple fragments can be inserted.
FIG. 4 illustrates the template without any fragment inserted therein. The page would be whole page cacheable but for the variable content field, which will change based on whether the person has silver, gold, or platinum status. In this situation, the variable content field 304 of the fully assembled webpage 300 can be designated for insertion of ESI fragments and cached at the edge server, and the personalized fragments are cached at the origin server. In prior art systems, the entire web page 300 is dynamically assembled with the template at the origin server for a particular user by fetching the appropriate silver, gold, or platinum fragment based on the user's status at the time of assembly. FIGS. 5 and 6 illustrate separate platinum and gold fragments, respectively. The designation of the content into fragments permits the reduction of space overhead and simplifies operations, since it is simpler to code and decompose code into reusable functions and pieces.
While it is clear that the user of ESI yields significant performance benefits, it always requires the addition of the ESI tags, in addition to other possible re-structuring necessities. If the web application is J2EE compliant, it is typically written as a series of JSPs that are aggregated using the well-known <jsp:include> mechanism. Using an application server program having dynamic caching capability, such as IBM's WebSphere, these “JSP includes” can be automatically converted to ESI includes, and the included JSPs can be made ready for delivery outside of the confines of a JSP:include statement for those JSPs identified as being edgeable (able to be moved to an edge of network server because it is not dependent upon back-end data or back-end transactional capabilities). However, problems may remain that prohibit the automatic restructuring of an existing J2EE-compliant application to leverage ESI. Specifically, when there are multiple versions of a fragment that are cacheable (such as the silver, gold, and platinum fragment versions discussed above), a method must be created to cache them separately, or they must not be cached at all. Under the prior art, this requires either breaking dynamic fragments into multiple static fragments with different names, or not caching them at the edge-server, requiring more hits to the origin server.
A primary reason for this inability to automatically restructure an existing J2EE-compliant application to take advantage of ESI is the use of web “sessions.” The concept of sessions is well-known and is a useful abstraction, but it can also hinder the “edgification,” i.e., the ability to modify the program to assemble and execute the page, away from and independent of the data at the origin server of many application pages. Using sessions a user logs-in, thereby initiating a new session. The originating server verifies the login, and when returning the next page to the user, also sends them a “set-cookie” header that includes a session ID linking the user's activity during the session (e.g., clickstream data) to user information stored on the originating server (e.g., address, demographics, etc.). The session information is then used for most of the customization of any page required, from personalizing advertisements to printing the user's name. The programming used for the session operation keeps all of the actual data on the originating server, and this raw data is not accessible to an edge-server. The client uses a session ID in the request, and all the session data is stored on the origin server, referenced by the session ID.
Thus, it would be desirable to have a method, system, and/or computer program product that would allow existing application pages, e.g., J2EE application pages, to be easily and automatically converted so that they can be accessible for caching at edge servers.