1. Field of the Invention
The present invention relates to a computer system, and deals more particularly with a method, system, and computer program product for caching dynamically generated content (including, but not limited to, dynamically generated Web pages), as well as determining when the cached content should be invalidated or purged.
2. Description of the Related Art
Techniques are well known for caching static content of files, Web pages, and so forth in order to improve the speed and efficiency of information retrieval. For example, when a user in an Internet environment requests delivery of a static Web page, the page may be initially retrieved from a remote server and then stored in a data store that is locally accessible to the requester""s computing device. (Or, in some cases the retrieved page may be stored in a data store of an intermediate server.) Subsequent requests by this user for the same page may be intercepted by a caching system which retrieves the page from this local (or intermediate) data store. In this manner, the system overhead of the round trip to the remote server is avoided, thereby improving the system performance and reducing the response time to the user. On the other hand, subsequent requests for a previously cached Web page may require retrieval of a new version of the page from the remote server, based on the results of a cache invalidation algorithm or process that determines whether the currently cached version has become out of date or stale. The Hypertext Transfer Protocol (xe2x80x9cHTTPxe2x80x9d), which is commonly used to request and deliver Web pages, provides one such cache invalidation technique wherein the creator of the page content may include a header field value that specifies the date and time when the content expires (that is, when it should no longer be considered valid). File modification timestamps are commonly used for cache invalidation (also referred to as cache refreshing), where a newer timestamp implies that the content of a file has changed and thus should supersede a previously cached version thereof.
With dynamically generated content, the caching of data and the cache invalidation issues are more complex than with static content. At the same time, generation of dynamic content is typically a much slower and more expensive operation (in terms of computing resources used) than retrieval of static content, and thus there are significant performance improvements to be realized with a well-designed dynamic content caching technique.
Dynamically generated Web page content is becoming more prevalent in the Internet environment. One common, simple use of dynamic page content is the xe2x80x9cvisitor countsxe2x80x9d which are often displayed on Web pages, with text such as xe2x80x9cYou are the 123rd visitor to this site since Jan. 1, 1999xe2x80x9d (where the count of visitors is accumulated at the server and inserted into the page syntax before returning the page to the user). Other simple uses of dynamic content include displaying the current date and time on the dynamically generated page. More advanced techniques for dynamic content allow servers to provide Web pages that are tailored to the user""s identification and other available information about the user. For example, servers providing travel reservation services commonly store information about the travel preferences of each of their users and then use this information when responding to inquiries from a particular user. Dynamic content may also be based upon user classes or categories, where one category of users will see one version of a Web page and where users in another category will see a different versionxe2x80x94even though all users provided the same URL (xe2x80x9cUniform Resource Locatorxe2x80x9d) to request the Web page from the same server. For example, some Web server sites provide different services to users who have registered in some manner (such as filling out an on-line questionnaire) or users who have a membership of some type (which may involve paying a fee in order to get enhanced services, or more detailed information). The difference in dynamically generated content from one generation to another may be as simple as including the user""s name in the page, as a personalized electronic greeting; or, the differences may be more significant (such as dynamic content that changes based upon each particular user""s past activities at this site). On-line shopping sites, for example, may include a recognition for repeat shoppers, such as thanking them for their previous order placed on some specific day or offering a special limited-availability discount.
A number of techniques exist for providing dynamic Web page content in response to a user request. An early technique is CGI (xe2x80x9cCommon Gateway Interfacexe2x80x9d) scripts. Application programming interfaces (APIs) for writing plugins were developed to provide efficiency improvements for Web servers. Examples include the NSAPI from Netscape and the ISAPI for the Internet Information Server, or xe2x80x9cIISxe2x80x9d. Active Server Pages (xe2x80x9cASPsxe2x80x9d) and JavaServer Pages (xe2x80x9cJSPsxe2x80x9d) were developed to provide even higher-level abstractions for writing logic to generate dynamic content, building on the plugin technology. ASPs are used on Microsoft Web servers to create ActiveX Controls, invoke their methods, and access their properties to generate dynamic content. JSPs may be used to create dynamic content using JavaBeans and in-line Java scripting in a page. (xe2x80x9cJavaxe2x80x9d, xe2x80x9cJavaServerxe2x80x9d, and xe2x80x9cJavaBeansxe2x80x9d are trademarks of Sun Microsystems, Inc.) Another technique is the use of servlets, which are executable code objects that can be dynamically invoked by the Web server to process a user request. Servlets typically perform some specialized function, such as creating page content based on dynamic factors.
Dynamically generated content may reflect the result of extracting information from a backend data store (for example, by issuing complex queries against a database, by invoking a legacy host application, etc.). The Host Publisher software product available from the International Business Machines Corporation (xe2x80x9cIBMxe2x80x9d) is an example of software that may be invoked in response to receiving a user request for Web content, where that content requires invoking a legacy host application or accessing a relational database. The data extracted using the legacy application may then be used to populate the properties of a Java bean, which can subsequently be used to generate dynamic HTML using JSP technology. The generated HTML page may then be transmitted to a user""s computer, where it will typically be rendered with a user agent such as a browser. Or, the populated bean may be accessed from a standalone application (e.g. using an Enterprise JavaBean, or xe2x80x9cEJBxe2x80x9d). Generation of dynamic content in this manner involves a significant amount of processing overhead. Processing the user""s content request involves making a connection to the legacy host system, completing a log-on process, navigating among various host screens to extract the pertinent information, and logging off. Similarly, generating content using complex database queries or other legacy data sources is also an expensive, high-overhead process. Retrieving previously-generated information from a cache, rather than generating the dynamic content anew, would make responding to these high-overhead content requests much more efficient.
Content that is dynamically generated may have a different result when the data used in the content creation changes, and/or when any of the application logic used in the creation process changes. On the other hand, depending on the particular data and application logic, such changes might not affect the resulting content. In addition to result differences caused by underlying data and logic changes, other factors may cause the generated content to vary from one invocation to another. One example of this situation is content that is time- or date-sensitive, such as a Web page that contains a time or date value. Another example is content that is designed to vary from one invocation of the page generation software to another, such as a Web page designed to accumulate and display a visitor count. For these types of Web pages, it does not make sense to cache the generated content. Because of the range of factors that may be involved in determining whether generated content changes, and when and how it changes, application-specific considerations must be accounted for in any viable cache invalidation technique when dynamic content is involved.
Prior art techniques exist which provide a dynamic content caching approach wherein a dynamically created Web page is stored along with the values of the HTTP input parameters which were passed with the page retrieval request (and therefore may have been passed to the page generation software). However, it may be necessary to use more factors than just the HTTP input parameters when creating dynamic Web page content. For example, state data is commonly used in Internet applications, where this state data requires special handing to overcome the inherent limitations of the stateless model on which HTTP is designed. Examples of applications which typically require state information are Internet shopping and e-commerce or e-business applications, which have gained tremendous popularity among Internet users in recent years. The Servlet API, which is typically used for creating such applications in the Java programming language, has defined an HttpSession class and methods, and ServletContext class and methods, whichxe2x80x94along with mechanisms such as cookies and URL rewriting that enhance the capabilities of the HTTP protocolxe2x80x94enable state data to be maintained over the course of an on-going user session. (It is this state information that enables an Internet user to add items to a shopping cart, for example, thereby building an order transaction from the data communicated in multiple related message exchanges with a remote HTTP server.)
The historical data accumulated in this manner affects the content of a generated page; in the shopping cart example, information such as the number of entries in the cart, the order total, available credit for this shopper, etc., may change with each message exchange. Suppose the shopping program provides a xe2x80x9cDisplay my shopping cartxe2x80x9d or xe2x80x9cDisplay my account balancexe2x80x9d option to the user. In this case, the parameters sent on the HTTP request are likely to be the user""s account number, an order number, or other similar information. If the resulting dynamically generated page is cached with only this type of input parameter being used to determine whether the page has become stale, then subsequent invocations of this same operation (sending the same input parameters), as the user continues shopping, will likely retrieve the result that was created and cached from the first invocationxe2x80x94giving the user an inaccurate response.
While this shopping cart example is pertinent to the Internet and e-commerce applications, it will be apparent that similar concerns exist in other environments and with other applications that generate dynamic content (for example, by issuing complex database queries).
Accordingly, what is needed is an improved technique for caching dynamically generated content.
An object of the present invention is to provide an improved technique for caching dynamically generated content.
It is another object of the present invention to provide this technique by accounting for application-specific factors in the caching and cache invalidation processes.
Another object of the present invention is to provide this technique for use with dynamically generated Web page content in an Internet environment.
Still another object of the present invention is to provide this technique where the cached content can be made available for use by applications other than the application which created the content.
Other objects and advantages of the present invention will be set forth in part in the description and in the drawings which follow and, in part, will be obvious from the description or may be learned by practice of the invention.
To achieve the foregoing objects, and in accordance with the purpose of the invention as broadly described herein, the present invention provides a computer program product, a system, and a method for caching dynamically generated content. This technique comprises: receiving a request for dynamically generated content; initially processing this request; generating a response to the received request using output properties; and transmitting the generated response. Initially processing the request further comprises: generating an object to process the request; setting one or more input properties of the generated object; executing logic of the generated object, wherein one or more output properties of the object are set as a result of the execution; and caching the executed object along with the output properties.
The technique may further comprise: receiving a subsequent request for the dynamically generated content; determining whether a cached version of the object exists which can be used for creating the subsequently requested content; using the cached version when the determination has a positive result; responding to the subsequent request when the determination has a negative result; and transmitting the subsequently generated response to the subsequently received request. Using the cached version when the determination has a positive result further comprises: retrieving the output properties from the cached version; and generating a subsequent response to the subsequent request using the retrieved output properties. Responding to the subsequent request when the determination has a negative result further comprises: repeating operation of the initial processing; and generating the subsequent response to the subsequent request using the output properties.
Determining whether the cached version of the object exists may further comprise: storing invalidation criteria for a selected class of the objects; evaluating these invalidation criteria; purging the selected class of cached objects when the evaluated invalidation criteria are met; and setting the result of the determination to the negative result if the cached version is thereby purged.
Or, determining whether the cached version of the object exists may further comprise: storing invalidation criteria for a selected instance of the objects; evaluating these invalidation criteria; purging the selected instance when the evaluated invalidation criteria are met; and setting the result of the determination to the negative result if the cached version is thereby purged.
The object may be a Java bean. The input properties may comprise one or more of: (1) a set of Hypertext Transfer Protocol (HTTP) input request parameters; (2) parameters stored in an HTTP session object; (3) parameters stored in a ServletContext object; and (4) information globally available to the executing logic. The received request and the subsequently received request may specify a JavaServer Page. (Alternatively, the received request may specify an Active Server Page.) The property names of the input properties and values thereof may be used as a key to identify the cached bean (or similarly, the cached object). The key may be formed by sorting the property names of the input properties, and concatenating the sorted property names and the values thereof with appropriate separators. The determination of whether the cached version exists may further comprise generating a subsequent key by sorting property names of subsequent input properties and concatenating the sorted subsequent property names and values thereof from the subsequently received request with appropriate separators.
Executing the logic may further comprise accessing one or more data repositories to retrieve information used in setting the output properties.
The cached version may be stored in an in-memory cache. Additional caching tiers beyond the in-memory cache may be used, wherein the additional caching tiers comprise one or more of: (1) a file system, and (2) a database. A plurality of servers may access the cached objects in the database.
Selected ones of the generated instances may not be cachable, in which case operation of caching is bypassed for these selected ones.
A remote application may access the cached objects. This remote application may use, for example, Remote Method Invocation (RMI) technology, Component Object Model (COM) technology, or CORBA (Common Object Request Broker Architecture) technology.
The present invention will now be described with reference to the following drawings, in which like reference numbers denote the same element throughout.