The present invention relates generally to data caching of web content on a has network and, more specifically, to a system for overriding the automatic caching of dynamic content in web pages in a web server.
The Internet and the World Wide Web (WWW) provide intra-enterprise connectivity, inter-enterprise connectivity and application hosting on a larger scale than ever before. By exploiting the broadly available and deployed standards of the Internet and the WWW, system users and designers can leverage a single architecture to build client/server applications for internal use that can reach outside to customers, business partners and suppliers.
FIG. 1 shows a commonly used network arrangement in which a plurality of local computer systems 200 in a local area network (LAN) may access a plurality of remote servers 100 through the Internet. Each remote server may be a web server (such as a Domino(trademark) web server, available from Lotus Development Corporation of Cambridge, Mass.) for providing a web site for access by local computer systems 200. Each web site normally further provides a plurality of web pages to be served to the local computer systems upon request. Each local computer system may access the remote web sites with web browser software.
The WWW is a collection of servers on an IP (Internet Protocol) network, such as the Internet, an Intranet or an Extranet, that utilize the Hypertext Transfer Protocol (HTTP). Hereinafter, xe2x80x9cInternetxe2x80x9d will be used to refer to any IP network. HTTP is a known application protocol that provides users with access to files, which can be in different formats, such as text, graphics, images, sound, and video, using a standard page description language known as Hypertext Markup Language (HTML). Among a number of basic document formatting functions, HTML allows software developers to specify graphical pointers on displayed web pages, commonly referred to as xe2x80x9chyperlinks,xe2x80x9d that point to other web pages resident on remote servers. Hyperlinks commonly are displayed as highlighted text or other graphical image on the web page. Selection of a hyperlink with a pointing device, such as a computer mouse, causes the local computer to download the HTML associated with the web page from a remote server. The browser then renders the HTML into the displayed web page.
Web pages accessed over the Internet, whether by a hyperlink, opening directly via an xe2x80x9copenxe2x80x9d button in the browser, or some other means, are commonly downloaded into the volatile cache of a local computer system. In a computer system, for example, the volatile cache is a high-speed buffer that temporarily stores web pages from accessed remote web sites. The volatile cache thus enables a user to quickly review web pages that were already downloaded, thereby eliminating the need to repeat the relatively slow process of traversing the Internet to access previously viewed web pages. This is called local caching.
On the server side, the first web servers were merely HTTP servers that resolved universal resource locators (URLs) by extracting literally from the URL the path to a file that contained the needed page, and transmitting the page back to the browser. Such a server was very simple; it could only be used to access static pages.
A xe2x80x9cstaticxe2x80x9d page is a page which, each time it is requested and served to a requester, has the same byte content. That is, it does not depend upon which requester is requesting the page, when the requester is requesting the page, etc., the byte content of that page remains the same. By contrast, a xe2x80x9cdynamic pagexe2x80x9d is a page which has byte content that may very well change depending upon the particular requestor, when the page is being requested, etc. This will be discussed further below. It is important that web pages be served as quickly as possible, both to reduce the response time to a single user, and to increase the number of users that can be served concurrently. To improve the response time, the Web server uses caches. Web server caches are used to store web page responses in a readily accessible memory location so that when the web page is requested by a user, a previously cached web page response can be retrieved from cache and served quickly to the user.
Caching web page responses by the web server works quite well for web page responses having static content, i.e., content that doesn""t change frequently. An example of a static web page is one, at a company""s web site, comprising a compilation of text and graphics objects describing that company""s history.
In fact, classic web servers cache static pages quite effectively. Specifically, classic web servers serve web page responses, some of which are static, namely, responses comprising HTML from the file system. Each of the static responses has a last modified date associated with it that is maintained by the file system. The contents of the response and its associated last modified date are simply stored in the cache for possible future use by the web server. When a subsequent request is received by the server for that page, the server requests the latest modification date for that page from the file system and compares the latest modification date with the last modified date associated with the candidate cached response. If the latest modification date is the same as the last modified date associated with the candidate cached response, the candidate cached response is considered to be xe2x80x9cfreshxe2x80x9d and is served to the request (i.e., to the requesting user). If the latest modification date is later than the last modified date associated with the candidate cached response, the candidate cached response is considered xe2x80x9cstalexe2x80x9d and a xe2x80x9cfreshxe2x80x9d response is retrieved and built by the web server for serving to the requesting user. The fresh response, along with its associated last modified date, is cached to replace the stale response. This caching scheme saves the time and server processor cycles that otherwise would have been spent to build requested pages which otherwise could have been cached using this classic caching scheme.
However, newer web servers provide not only static web pages but also dynamic web pages, i.e., a page having byte content that may very well change depending upon the particular requester, when the page is being requested, etc. Examples of dynamic web pages are pages containing content from a number of different sources or pages having computed content. For example, a page may contain macros that compute content for the page, i.e., the page has xe2x80x9ccomputable contentxe2x80x9d. These macros may change the page content each time the page is accessed. This makes it difficult to cache that page using the classic caching method described above. (Macros, or formulas as they are named in Lotus Notes software, are expressions that perform a function, such as determining field values, defining which documents appear in a view, or calculating values for a column. Lotus Notes is available from Lotus Development Corporation in Cambridge, Mass.)
Alternatively, the page may contain information from a number of different sources, and that information may or may not have associated last modified dates making it difficult, if not impossible, to cache using the classic caching method. For example, the page may comprise a composite of a number of xe2x80x9cpartsxe2x80x9d including: other documents, designs from databases, content from databases, the present user""s identity, the current time, the current environment, etc. Some of these parts are actual entities in the system, e.g., documents, databases, etc. Some parts though are xe2x80x9cvirtualxe2x80x9d and are used to model the effects of the execution of macros or scripts, e.g., the user""s identity may be accessed via one of a number of @functions such as @UserName, @UserRoles, etc., in Lotus Notes software. (xe2x80x9c@functionsxe2x80x9d are macros for performing specialized tasks in Lotus Notes formulas. They can be used to format text strings, generate dates and times, format dates and times, evaluate conditional statements, calculate numeric values, calculate values in a list, convert text to numbers or numbers to text, or activate agents, actions, buttons, or hotspots.) These various part types are computable parts and have correspondingly various types of attributes that can not be handled by the classic caching systems and methods of prior art.
Clearly, it is more difficult to use caching as a mechanism for improving user response time for pages with dynamic content. This problem for the server is twofold. First, after building a web page response, the server must determine whether the response that it is preparing to serve the requesting user is cacheable (i.e., determining its cacheability). Second, the server, upon receiving a request for a web page whose previous response has been cached, must determine whether the cached response is valid (i.e., determining its validity) and applicable (i.e., determining its applicability). For instance, web page responses containing macros that are time-dependent may not be cacheable at all. If a page includes a macro for providing the current time, then every access of the page is unique and the page cannot be cached in memory at all. Another example is where is a cached page is valid for serving to some users but not others. For instance, if the page includes a macro for the user""s name, then the page can be cached for serving to that particular user, but not for serving to others. (HTML representing a document is specific to a user if macros are dependent on user name or user roles. Using this user data, some data may be made visible based on which user is requesting it.)
The term xe2x80x9cDynamic HTMLxe2x80x9d (DHTML) needs to be explained in the context of the method and system of the present invention. xe2x80x9cDynamicxe2x80x9d as used in DHTML is referring primarily to the effect that the code has on the web page appearance at the browser. For instance, the dynamic HTML may comprise scripts that run on the browser to change the appearance of the web page such as by displaying a button that, if pushed, displays additional text or graphics. The key distinction is that xe2x80x9cdynamicxe2x80x9d in the DHTML sense refers to the browser, not the server. From the server""s point of view, a DHTML page may still be xe2x80x9cstaticxe2x80x9d in that the byte content may be the same each time the page is requested, so for the purposes of this invention, a DHTML page may be xe2x80x9cstaticxe2x80x9d or xe2x80x9cdynamicxe2x80x9d in the sense of the invention. The content is not dependent on any thing, e.g., the properties of the request, such as the identity of the particular user, the time of day that the request is made, etc. xe2x80x9cDynamicxe2x80x9d content, as used in the system and method of the present invention, refers to content that has such dependencies. Thus, xe2x80x9cdynamicxe2x80x9d in the DHTML sense is not related to xe2x80x9cdynamicxe2x80x9d in the sense of the invention.
As can be readily seen, using caching as a means for increasing server performance for responses which have dynamic content has a number of complications and difficulties which have not been overcome by any of the systems of the prior art. As such, HTML representing responses having dynamic content has not been cached in the past. Accordingly, system and method to cache content that can include dynamic content without suffering from the drawbacks discussed above is needed.
According to the present invention, a caching system and method utilized within a web server is disclosed that automatically caches web content, such as a web page, that has dynamic content. The caching system and method of the present invention is utilized within a web server which receives requests for web pages and, based upon those requests, serves web page responses that were previously cached or, if those cached responses are either inapplicable or invalid, the server builds a new response and serves it to the requester. The caching system performs two critical functions: first, it determines the cacheability of built responses and caches those responses it deems cacheable and second, if a cached response appears appropriate for a particular web page request, the caching system examines the cached response to determine whether the cached response is applicable for the particular request and whether the cached response is still valid. Each response is comprised of a plurality of parts, some of the parts being dynamic in nature. The parts have associated attributes that, either explicitly or implicitly, characterize the nature of the parts. The caching system comprises an attribute analyzer that creates a composite set of attributes, the composite representing the characteristics of the response. The caching system further comprises a cacheability analyzer that analyzes the attribute composite set and determines the cacheability of the response. The server then caches the response based upon that determination. Examples of attributes utilized for determining cacheability include the time variance setting of the dynamic content, the user""s identity, or the location of the content.
The caching system further comprises a cached-response analyzer for analyzing the cached responses prior to serving to a requesting user. The cached-response analyzer comprises an applicability analyzer (for determining the applicability of the cached response to the particular request) and a validity analyzer (for determining the validity of the cached response). If the cached response passes the tests performed by these analyzers it is served to the requesting user.
The caching system of the present invention further comprises a system for overriding the automatic analysis performed by the system. The override system can be set by the document creator, the page designer or the system designer.
The method steps may also be implemented in program code for modifying a computer system to cache information that has dynamic content.