A web browser is a computer program used for accessing sites or information on a network, such as the World Wide Web (WWW). Some of the more commonly used web browsers are Microsoft Internet Explorer®, Netscape Navigator®, Opera®, Mozilla®, and Apple Safari®. A client computer is a computer that executes a web browser. A web page is one or more files containing information that may be displayed on a client computer by a web browser. A web server is a computer in the network to which the client computer is connected that stores web page files.
A web browser user accesses a web page by providing input, such as keyboard or mouse input, which specifies the desired web page. The input is a universal resource locator (URL), commonly referred to as a web page address. An example of a web page URL is http://www.google.com. Another example of a URL is simply an Internet Protocol (IP) address of the web server, such as http://216.239.36.10. Yet another example of a URL is the address of a specific file on a web server, such as http://yourfavoriteserver.com/index.html. A hypertext link, or link, is text or an image displayed by a web browser that has a URL associated with it. When a user clicks on a link, the user is requesting the web browser to access the file or web page referred to by the URL associated with the link. A hypertext transfer protocol (http) request is a request issued by a web browser onto the network to retrieve from a web server a file specified by the URL.
A web page may be comprised of many individual files that must be transferred over the network from the web server to the client computer. A common type of web page file is a hypertext markup language (HTML or html) file. HTML is a programming language used to create web pages. In addition to html code, an html file may also include code in other programming languages, such as JavaScript® or VBScript®. Another common type of web page file is an image or graphics file, such as a .gif, .jpg, or .pdf file. Other types of web page files are audio, video, and applet files. When a web browser parses an html file, it may encounter a reference to another file on the web server, such as a graphics file or a JavaScript file.
Consider the following web page named index.html, which contains html source code and references to two external graphics files referred to as picture_A.jpg and picture_B.jpg. A user points his browser at http://www.yourfavoritewebserver.com/index.html. The web browser issues to the web server an http request on the network for index.html. The web server returns index.html to the client computer. The browser parses through the code contained in index.html and determines the index.html code references picture_A.jpg and picture_B.jpg on the web server. In response, the browser issues an http request for picture_A.jpg, the web server returns picture_A.jpg, and the browser displays picture_A.jpg; the browser issues an http request for picture_B.jpg, the web server returns picture_B.jpg, and the browser displays picture_B.jpg.
Web browser users are familiar with the relatively long delay experienced when waiting for a web page to be loaded from the web server and displayed on the client, computer. One cause of the delay is the slow transfer speed of data across the network relative to the data transfer rates from the client computer disk drive, for example. The client computer may be connected to the network by a 56K modem, for example, which has relatively slow data transfer speed. Even if the client computer is connected to the network via a faster medium, such as cable modem or a T1 connection, some of the web page files which must be transferred over the network from the web server to the client are so large, such as some image files, that they require a relatively long time to transfer even at high transfer rates.
To reduce the delay, web browsers typically employ a cache, referred to as a browser cache, on a mass storage device of the client computer, such as a disk drive. When a browser retrieves a file from a web server, the browser saves a copy of the file in the browser cache. The next time the file is requested, the browser checks the browser cache to see if the requested file is present in the cache. A query to the browser cache revealing the file is hot present is referred to as a cache miss. A query to the browser cache revealing the file is present is referred to as a cache hit. If the file hits in the cache, then the browser can satisfy the request for the file from its cache instead of issuing an http request on the network to the web server. In the example above, index.html, picture_A.jpg, and picture_B.jpg will all be cached in the browser cache after being returned by the web server. Future accesses to these files may be satisfied from the browser cache, thereby alleviating the need to incur again the potentially long delays associated with transferring the files from the web server across the network.
However, the information in web page files transferred from a web server to a client may be classified into two categories with respect to file caching. A static file is a file whose content does not change. A common example of a static file is an image file. A dynamic file is a file whose content may change. An example of a dynamic file is an html file that contains changing content, such as player statistics of a basketball game in progress, or stock market sales price information. Caching of static files is beneficial. However, caching of dynamic files may result in undesirable operation since the user may receive stale or out-of-date information.
Current browser caching technology does not handle the distinction between static files and dynamic files well. For example, Internet Explorer enables a user to choose from four caching policy settings. A first setting specifies that when the user returns to a previously viewed web page, the browser checks with the web server for changes to the page since the page was last accessed. That is, the browser ignores its cache and issues a new http request for all the files making up the web page. A second setting specifies that when the user returns to a previously viewed web page, the browser never checks with the web server for changes to the page. That is, the web browser always look to its cache for all requested files, and never makes a new request for a file that hits in the browser cache, even though it may be possible that a newer version of the file exists on the web server, i.e., even though the file's contents may have changed. With this setting, the user must click on the Refresh button to force the web server to be re-accessed. A third setting specifies that when the user returns to a previously viewed web page, the browser does not check with the web server unless the previous visit was in an earlier session of the browser or on a previous day. The fourth setting is similar to the third setting, except that if the browser determines that the files on the page are changing infrequently, the browser checks with the web server even less frequently. Other browsers include a setting that allows the user to specify an age; if the cached version of the file is older than the specified age, the browser accesses the server rather than satisfying the request out of the cache.
As may be seen from the discussion above, current browsers employ a limited ability to determine whether an entire web page and its associated files should or should not be cached. However, the present inventors are not aware of a web browser that has the ability to determine which individual files that make up a web page must be re-fetched from the web server in order to display the current content of the web page. That is, the browser does not have the ability to determine which files of a web page are static and which are dynamic. The inability to distinguish between static and dynamic web page files may be detrimental to dynamic web application performance since the user must set the browser caching policy to disable caching in order to avoid receiving stale data, which forces all the web page files to be re-fetched from the web server. However, in some applications a large percentage of the web page content may be contained in static files that could be satisfied from the cache, and the web page files that are dynamic may constitute only a small percentage of the data that must be transferred from the server to the client.
Using the example above, assume index.html is a 4 KB dynamic file, and that picture_A.jpg and picture_B.jpg are each static 2 MB files. If a distinction could be made between static and dynamic files, the browser could satisfy subsequent requests for picture_A.jpg and picture_B.jpg from its cache, and re-fetch only index.html from the server, thereby potentially improving performance substantially.
Therefore what is needed is a method for selectively defeating browser caching on a file-by-file basis so that dynamic files are obtained from the web server, while static files are quickly obtained from the browser cache, thereby improving overall performance.
Another limitation of current web browser caching technology with respect to a web page that includes both static and dynamic files is that it does not provide an ability to control file caching that may be performed by other computers in the network between the client computer and the web server.
Therefore what is also needed is a method for selectively defeating network file caching on a file-by-file basis so that dynamic files are obtained from the web server while static files are quickly obtained from the browser cache, thereby improving overall performance.