1. Technical Field
The present invention relates in general to displaying web pages on a web browser and in particular to the caching and displaying of web pages downloaded from the Internet using a web browser. Still more particularly, the present invention relates to method and system for identifying cached web pages to a user using a modified web browser.
2. Description of the Related Art
The development of computerized distributed information resources, such as the xe2x80x9cInternet,xe2x80x9d allows users to link with servers and networks, and thus retrieve vast amounts of electronic information heretofore unavailable in an electronic medium. Such electronic information increasingly is displacing more conventional means of information transmission, such as newspapers, magazines, and event television. The term xe2x80x9cInternetxe2x80x9d is an abbreviation for xe2x80x9cInter-network,xe2x80x9d and refers commonly to a collection of computer networking. TCP/IP is an acronym for xe2x80x9cTransport Control Protocol/Internet Protocol,xe2x80x9d a software protocol developed by the Department of Defense for communication between computers.
Internet services are typically accessed by specifying a unique address, or universal resource locator (URL). The URL has two basic components, the protocol to be used, and the object pathname. For example, the URL xe2x80x9chttp://www.uspto.govxe2x80x9d (home page for the United States Patent and Trademark Office) specifies a hypertext transfer protocol (xe2x80x9chttpxe2x80x9d) and a pathname of the server (xe2x80x9cwww.uspto.govxe2x80x9d). The server name is associated with a unique numeric value (a TCP/IP address, or xe2x80x9cdomainxe2x80x9d).
The Internet has rapidly become a valuable source of information to all segments of society. In addition to commercial enterprises utilizing the Internet as an integral part of their marketing efforts in promoting their products or services, many federal, state, and local government agencies are also employing Internet sites for informational purposes, particularly agencies which must interact with virtually all segments of society such as the Internal Revenue Service. The information provided is often updated regularly to keep the users up to date with changes which may occur from time to time.
The World Wide Web (Web) is a graphic, interactive interface for the Internet. There are different programs (web browser clients, referred to hereinafter as web browser) on a data processing system (also referred to as a computer) connected to the Web that are utilized to access servers (a program on another data processing system) connected to the Web. The program on the server is generally termed a xe2x80x9cweb site.xe2x80x9d Web sites are a collection of xe2x80x9cweb pages,xe2x80x9d where web pages are graphic displays which are usually linked together and may be downloaded to a data processing system utilizing a browser client. Each web page has a unique address, or Universal Resource Locator (URL) within the Web that is accessible by utilizing Transfer Control Protocol/Internet Protocol (TCP/IP) transactions via telecommunication networks and a modem. The address allows Internet xe2x80x9cbrowserxe2x80x9d clients (computer program applications) to connect and communicate with a HyperText Transfer Protocol (HTTP) server over the Web.
Retrieval of information on the Web is generally accomplished with a hypertext markup language (HTML)xe2x80x94compatible xe2x80x9cbrowserxe2x80x9dxe2x80x94an application program capable of submitting a request for information identified by an URLxe2x80x94at the client machine. The information is provided to the client formatted according to HTML.
Each WWW address specifies or implies a reference to one particular site on the Internet. This means that without some kind of additional machinery, whenever a person requests a specific WWW address, no matter where she is from and no matter how often others in her network request the same address, she will make a network call to that specific site, leading to unnecessarily high use of network links and excessive load on the servers for popular sites.
High use of network lines and excessive load on popular servers leads to one of the single biggest problems experienced by Internet users today: lack of adequate bandwidth. Information abounds on the Internet, but the delay involved in retrieving that information frustrates many users. Until the Internet infrastructure upgrades to bigger xe2x80x9cpipesxe2x80x9d which can transmit greater amounts information in the same amount of time, Web surfers must look to other means to relieve the congestion.
When Web pages are retrieved under direct user control, it is common practice for contemporary Web browsers to cache pages accessed by the user. Large traffic demands to specific Web sites can make access to such sites difficult. The amount of time which a user must wait to view a Web page during peak utilization periods can be very long. Network bandwidth is finite, and the time required to retrieve a Web page depends in part on the number of servers at the site from which the Web page is being retrieved. Furthermore, Web pages often include sizable graphics files or other large files requiring a substantial amount of time to transfer from the source to the requesting client. Caching Web pages allows a user to repeatedly view the information within a short span of time without retrieving the Web pages each time. It provides a local (or networked) copy of a web page previously retrieved off the Internet to speed up reloading of the page when desired.
Caching is a generic term meaning xe2x80x9cto store.xe2x80x9d It typically is completed to avert internet traffic. As applied to the Internet, xe2x80x9ccachingxe2x80x9d means the copying of a web page, made incidental to the first access to the page, and storage of that copy for that purpose of speeding subsequent access.
Caching helps to relieve Internet congestion by expediting user access time, decreasing the amount of bandwidth each user uses and bandwidth used on the Internet generally, on network servers, and on remote servers.
There are two ways to cache web pages on the Internet: xe2x80x9cclient cachingxe2x80x9d and xe2x80x9cproxy caching.xe2x80x9d Client caches reside within an individual user""s Web browser (such as Netscape or Mosaic). Client caching takes two forms: persistent and non-persistent. A persistent client retains its documents between invocations of the Web browser. Netscape uses a persistent cache. A non-persistent client cache (used in Mosaic) removes any memory or disk space used for caching when the user quits the browser.
When the user""s computer requests a website, the computer will first check to see if the data requested already resides in the cache. If the cache has a copy of the requested data then the cache provides the data very quickly to the user. If the data is not in the cache, the computer fetches the item needed from the Internet, and also stores a copy in the cache. Now the cache has this data available if the processor requests it again. The larger the cache, the more data the cache can store and the more likely the cache will have the requested item.
The second form of caching, xe2x80x9cproxy caching. xe2x80x9d takes place on a network used by the World Wide Web (xe2x80x9cWWWxe2x80x9d or xe2x80x9cWebxe2x80x9d). Proxy caches reside on machines in strategic places (typically gateways) in the network of the WWW. Proxy servers act as intermediaries between local clients and remote content servers. Caching of Web pages is also performed at proxies. Thus, caching in proxies, which serve an entire intranet, can benefit the entire local network.
When a user asks a client for a certain web page, the client heads out to the Internet. If there is a caching proxy, client requests go to the proxy server, not to the remote web page. The proxy checks to see if it has already cached the requested page on the proxy server. If the server has cached a copy of the web page, the server returns the page to the client directly. Reporting cached information to clients occurs rapidly because it requires reduced internet activity. Caching reduces the computational load on the remote content server and makes it possible for that server to supply data to more machines exponentially. If the server does not have a cached copy of the requested document, the server goes out to the remote web page server, finds the original, and passes the data back to the client at the same time keeping a copy on its cache.
As described above, when users request information from a remote website, they may in fact receive that information from a cache (either local cache or proxy cache). If the cache information is xe2x80x9cstalexe2x80x9d (i.e. the remote website has changed its content since it was cached) the user has received, at best, outdated information and, at worst, harmful and misleading information. The degree of the threat of stale information depends on the nature of the website""s content. If a user requests today""s Dilbert cartoon, but receives yesterday""s cartoon because the cache has not updated yet, the user suffers little harm beyond annoyance.
However, when the user utilizes the Internet for financial transactions, for example, when investing money based on a cached page of the NYSE ticker page or relying on stock quotes, the time of the information received and displayed on the web page has to be the most current information available. Financial and other similar sites may change their information regularly (i.e., perhaps every 15 minutes, or every 10 seconds). In such a situation, information present on a web site when it is first downloaded and cached may be vastly different from the information available 30 seconds later when the site is desired to be revisited.
Thus, in the current world wide web situation, it is very possible and sometimes common for a web pages to be updated regularly. Some pages, such as financial pages, need to be viewed with the latest data by those interested in a quickly changing situation. However, the web browser cache can sometimes display cached data without the users knowledge. In fact, this is more often the case that the cached page is displayed rather than a freshly downloaded page. Some pages may contain a time/date stamp; however the page manager determines if to include a stamp and most pages are not time/dated. In some instances, the software/application is set to automatically update the time/date each time the document is accessed giving the user the perception that the document has just been retrieved. Thus the stamp does not necessarily indicate to a user that the page was cached. In instances when the updates are irregular and the stamp is of the time of the last update, the stamp provides little useful information to the user concerning the status of the displayed copy. Thus, the time/date stamp is an unreliable indicator about the status of the document being retrieved.
Current caching technology for non-web browsers is understandably hidden as users of these programs (or operating system) do not want to be notified when the data being used is cached data. However, on a web browser, sometimes it""s okay to used cached data and sometimes it is unacceptable to use cached data. For this reason, web browsers are typically designed with a xe2x80x9creloadxe2x80x9d (or refresh) button by which the user may manually override the use of cached data. However, currently users cannot tell or guess correctly if what they are reading before them is cached data or not. Depending on the preferences for the individual user""s browser, web page caching cannot be easily predicted from browser to browser on each person""s workstation.
Often during operation, current browser technology tries to contact a sever site. If it unable to do so, but it has a previously cached page from the same site, a pop-up message will notify the user that the program is using cached data instead. However, this notification only covers cases where a server site is unreachable. It does not cover the case where the user is heavily using the xe2x80x9cBackxe2x80x9d and xe2x80x9cForwardxe2x80x9d options on a browser. In that case, the user is heavily using cached data with an occasional non-cached page inserted into the path.
Some prior art methods permits periodical retrieval of newer copies of a cached web page. In this method, a timer based mechanism is alerted to download a new copy of the web page from the server and store it in the cache location. This method thus provides a current copy of the web page information if it is accessed on screen immediately after the cache retrieval operation. At all other times the cached page is still xe2x80x9cstale.xe2x80x9d
In other prior art methods, as implemented in AOL 4.0, a user is provided with the option of determining upon startup of the browser application, whether he wishes a every request for a page to be responded to with a cached page or with a newer version of the page. No indication is provided to the user as to whether the page information has changed and the user is forced to wait while the request is sent over the Internet even when a cached page would be appropriate.
Another potential problem exists when there is more than one user of a web browser application on a terminal/computer. With settable browsers, the primary user may set the browser application to always utilize cached pages. When a second user attempts to use the browser for downloading web pages, he has no way of knowing that the page is a cached page when the first user had previously downloaded the web page.
The present invention recognizes that it would therefore be desirable to have a method and system for distinguishing to a user whether a displayed web page is a cached web page or newly downloaded web page.
It is therefore one object of the present invention to provide an improved method and system for displaying web pages on a web browser.
It is another object of the present invention to provide an improved method and system for displaying cached web pages on a web browser.
It is yet another object of the present invention to provide an improved method and system for identifying cached web pages to a user via a modified web browser application.
The foregoing objects are achieved as is now described. A modified web browser application on a data processing system for use in searching the Internet and displaying web pages is disclosed. The modified web browser has a cache area which caches/stores a copy of a web page downloaded from the Internet. When a particular page is requested, logic components within the modified web browser application determine if the particular page is resident in the cache area. If the particular page is resident in the cache area, it is displayed within the modified web browser along with an indicator by which the user is notified that the particular page displayed is cached. In one embodiment, the indicator is a cache message button which is displayed within the web browser. In another embodiment, the indicator is a color coded scheme which causes the web page or web page border to be displayed in a different color whenever the particular page is cached. In a third embodiment, the indicator is presented as an interactive dialog box having instructions to the user to select a refresh option if display of the cached paged is not desired. In yet another embodiment, the indicator or dialog box indicates to the user the location of the cached document (i.e., local cache or proxy cache).