1. Field of the Invention
This invention pertains to the arts of electronic web site technology, and especially to the technologies of web site mirroring and caching.
2. Description of the Related Art
As described in the related application, the use of the World Wide Web (xe2x80x9cWWWxe2x80x9d) has grown dramatically and is expected to continue to grow as more businesses, government agencies, educational institutions, and private consumers become web users and web site owners. As network usage grows, Internet network bandwidth management has become an art of its own.
FIG. 1 shows the basic prior art topology of Internet servers, routers, and client browser systems. A web site server typically consists of a server computer (1), a local data storage for original site content (2), and a data interface (3) between the local data storage (2) and the server computer (1). Often, the local data storage is simply a file system on the disk subsystem of the server computer, but in the cases of very large web sites, this local storage may be another network file server. In the former case, the data interface (3) is typically a high-speed computer bus such as Small Computer System Interface (xe2x80x9cSCSIxe2x80x9d). In the latter case, the data interface (3) may be a 100 Megabit/second Ethernet. The web site server computer (1) usually includes necessary software suites, such as a web server suite and web communications package (e.g. TCP/IP stack), and possibly some security or firewall software. Finally, the web server interfaces to the World Wide Web (5), or Internet, through a data interface (4) such as a dial-up modem, T1 data line, or through a router over a local area network (xe2x80x9cLANxe2x80x9d). Stored in the local data storage (2) are web pages, typically in Hyper Text Markup Language (xe2x80x9cHTMLxe2x80x9d), and all related web objects used in those pages, such as graphics image files (e.g. GIF and JPEG files), video clips (e.g. AVI files), audio clips (e.g. WAV files). This type of web server architecture is well-known within the art, as are all of the above-mentioned web object types.
At the other end of the xe2x80x9cconnectionxe2x80x9d are web browser computers (14). A web browser computer typically uses a local data interface (10), such as a dial-up modem or LAN, to interface to a local bridge or router (9), in the case of a corporate facility. The local bridge or router (9) relays Hyper Text Transfer Protocol (xe2x80x9cHTTPxe2x80x9d) requests received from the web browser (14) to an Internet Service Provider point-of-presence (xe2x80x9cISP POPxe2x80x9d) via an upstream data connection (8) such as another dial-up modem, cable modem or T1 data line. The ISP POP router (7) relays the HTTP requests for documents to another upstream data connection (6) which reaches the backbone of the Internet (5). Routers within the Internet (5) relay the HTTP requests to the appropriate web site server based on Domain Name Service (xe2x80x9cDNSxe2x80x9d) and Universal Resource Locator (xe2x80x9cURLxe2x80x9d) addressing schemes. The web site server computer (1) retrieves the page and web objects requested from its local data storage (2), and transmits them using the HTTP protocol through the Internet (5), ISP POP router (7), local router (9), to the web browser computer (14). This arrangement and the technologies used to achieve this are well-known within the art, too.
A problem arises in bandwidth management the further xe2x80x9cupstreamxe2x80x9d in the network that a piece of equipment is located. For example, the local bridge or router (9) may serve ten""s to even hundred""s of browser systems. So, the data bandwidth sending and receiving on its upstream connection (8) to the ISP POP router (7) is the aggregation of the individual bandwidths of the downstream web browser systems. This aggregation occurs again at the ISP POP router level, as the ISP POP router (7) may server many local downstream routers, so the data bandwidth requirements on the ISP POP router""s upstream link (6) is the aggregation of all the bandwidth demands of all the downstream local routers.
The most unsophisticated way to deal with this problem is to increase the data rates of the data interfaces on the upstream connections both at the ISP POP and at the local routers and bridges. However this is very expensive, and is only a temporary remedy as web site content is increasing in data size as web page content is quickly migrating from primarily text and still graphic files towards text with audio and video clips. Additionally, this approach does not relieve a xe2x80x9cbottle neckxe2x80x9d problem which appears at the web site server data interface (4). If this interface is relatively slow for the number of requests, or xe2x80x9chitsxe2x80x9d, it receives, the response to the web browsers will be slow even though the other downstream data links are high speed.
To address this problem, a number of Internet equipment manufacturers have introduced caching routers, such as the Cisco Cache Engine 500 Series and the NetApp C700 series from Network Appliance, Inc. Basically, a caching router is an enhanced router which incorporates its own local data storage. Based on xe2x80x9chit trendsxe2x80x9d on certain web sites and pages, the processor (11) in the caching router decides to store in its own local data storage (12) temporary copies of the pages being frequently requested. When those cached pages are requested again by a client downstream, the caching router retrieves the cached copy from its local storage (12) and transmits it to the requesting browser. This eliminates duplicate requests for the same page going upstream to the web server, and eliminates retransmission of the page from the server downstream through the routers (7 and 9) to the browser (14). The apparent response time of the web server is decreased and improved, as well. These kinds of caching routers are commonly deployed at any point of aggregation of bandwidth in the network, including corporate routers and bridges between the ISP connection and the corporate area networks, and ISP POP routers between the Internet backbone connections and the downstream links to end users.
One problem that arises with the use of caching routers is that the data may become stale over time. Most caching routers"" algorithms have configurable retention times, after which a page is automatically deleted and a new, fresh copy is retrieved and stored from the origin web server on the next request. This allows the caching router to automatically manage its depository of copied pages, deleting in time any pages which are not requested very often and storing those which become requested at a greater frequency.
However, the person receiving the cached page at the web browser computer is typically not aware that the page was cached and may not be up-to-date. The cached page contains no indication of when it was fetched from the original web server, and contains no indication of how long it has resided in the router""s cache. While in many cases, this may not be problematic, in many other cases it is a serious problem. For example, consider a day trader who is using a particular web site to retrieve xe2x80x9clivexe2x80x9d stock quotes or news releases as they occur. If he requests a web page which is very popular, it is likely he will receive a cached copy of the page instead of a fresh, up-to-date copy from the original web server. This may be more misleading than the case without a caching router, too. If the router were not caching, the response to the page request may take considerable time, seconds or even minutes, which would give some indication to the user that the data may not be very current. But, with the caching router, the response may be almost instantaneous, which may lead the user to think it is very current.
Besides possible copyright issues that may arise with the unauthorized temporary copying of the original web site into the caching router into its memory, the web site operator may believe he or she is suffering economic loss because his attempts to deliver near-real-time information via the Internet are experiencing interference by the caching routers, none of which is being notified or announced to either the end user or the web site operator.
Therefore, there is a need in the art for an improved caching method and mechanism for web pages which allows notification of the caching time or period to the end user. There exists an additional requirement for an improved caching device which allows an end user to request a fresh or current copy of a cached page in order to allow requests from time-relevant web sites to be honored and processed appropriately. There exist strong technical and marketing needs for this enhanced caching method and mechanism to be implemented in technologies which are compatible with currently deployed web server, router and bridge technologies.