1. Technical Field
The present invention relates, in general, to an improved method and system for accessing the most recent version of a requested data file that has been downloaded into a private network from a source external to the private network. In particular, the present invention relates to an improved method and system for accessing the most recent version of a requested data file that has been downloaded into a private network from a source external to the private network by utilizing a "common cache" which is implemented within a collection of hierarchically distributed computers within the private network.
2. Description of Related Art
The present invention is directed toward the improvement of the efficiency of prior art methods and systems used with internetworks. An internetwork is an informal collection of packet-switching networks that is (a) interconnected by gateways, (b) uses protocols allowing it to function as a single, large, virtual network, (c) consists of an interconnection of individual personal computers and computer networks each of which belongs to a public or private entity, such as a person, corporation, university, government agency, or laboratory, and (d) uses existing telecommunications facilities to establish interconnections. M. Weik, Communications Standard Dictionary 475 (3rd ed. 1996).
The most well known internetwork is the public network merely referred to as "the Internet." The Internet is the formal collection of networks and gateways that (1) includes among others, the military network (MILNET), and the National Science Foundation network (NSFNET); (2) uses the Transmission Control Protocol/Internet Protocol (TCP/IP) protocol suite; (3) functions as a single, virtual network; and (4) provides global connectivity. Id at 474. Since its inception, the Internet has continued to grow rapidly. In early 1995, access was available in 180 countries and there were more than 30 million users. It is expected that 100 million computers will be connected via the public Internet by 2000, and even more via enterprise internets. The technology and the Internet have supported global collaboration among people and organizations, information sharing, network innovations, and rapid business transactions. "Internet," Microsoft.RTM. Encarta.RTM. 96 Encyclopedia. .COPYRGT. 1993-1995 Microsoft Corporation. All rights reserved.
The Internet essentially provides a mechanism whereby a packet-switched data communications channel is established through a packet-switching network between two machines. The Internet is centrally managed, and only computers which appear within the Domain Name System (DNS) can actually establish a true, non-virtual, connection to the Internet. The DNS is actually an online distributed data base which is run on Internet servers throughout the Internet which map human readable addresses into Internet Protocol addresses. Computers that connect directly into the Internet are known as "hosts" and carry "domain names" which uniquely identify them as true nodes within the Internet. M. Weik, Communications Standard Dictionary 262 (3rd ed. 1996).
Each Internet "host" generally has a number of "client" computers attached to it, and functions as a "server" for those computers with the service provided being Internet access. In communications networks the terms "client" and "server" refer to client-server architecture which is a network-based system that uses client software running on one computer to request a specific service and uses corresponding server software running on a second computer to provide access to a shared resource managed by the second computer. M. Weik, Communications Standard Dictionary 137 (3rd ed. 1996). The server then connects to a resource, such as the Internet, which provides the specific service requested. The server thus allows efficient sharing of a resource. Thus, the terms "client" and "server" are context dependent, and a computer that functions as a "client" in one context (such as a computer requesting Internet access) may function as a "server" in another context (such as where that same computer has a number of computers attached to it and functions as a network resource manager for those attached computers). This same relationship can extend indefinitely, with each attached computer capable of functioning both as a "client" or a "server", dependent upon the computers connected to them and the software architecture on the computers.
This hierarchical relationship has been recognized in the DNS, and consequently a human readable DNS name generally has the structure of a sequence of labels separated by periods, with each label generally specifying a computer (although not always, in that sometimes the labels map to drives or files), and with a file extender being appended to the last label, which corresponds to a specific file on the computer being accessed. For example, a human readable DNS might be nic.ddn.mil/somefile, in which nic (Network Information Center) is the name of the host computer connected to the Internet, ddn (Defense Data Network) is the subdomain corresponding to computer connected to the primary domain computer, and mil (MILNET) is the primary domain corresponding to a computer connected to the host computer, and somefile is a file residing on ddn. M. Weik, Communications Standard Dictionary 262 (3rd ed. 1996). Furthermore, each of these domains and subdomains could merely be drives or files as discussed.
Use of the DNS allows computer to computer connection, but once so connected how are the machines to understand each other? In order to understand each other, the machines on either side of the connection must "speak the same language." Toward this end application programs were developed which ran upon the computers communicating over the Internet which allowed the computers on either end of the channel to understand each other. For example, an application called "Gopher" allows users to create and use computer file directories. This service is linked across the Internet to allow other users to browse files. Another application program called File Transfer Protocol (FTP) allows users to transfer computer files easily between host computers. This is still the primary use of the Internet, especially for software distribution, and many public distribution sites exist. The Telnet application allows users to log in to another computer from a remote location. However, all of these preceding referenced protocols generally are meant to connect two computers and allow them to communicate. "Internet," Microsoft.RTM. Encarta.RTM. 96 Encyclopedia. .COPYRGT. 1993-1995 Microsoft Corporation. All rights reserved. In 1989 applications to connect computers took a quantum leap forward.
In 1989 the World Wide Web (WWW) was developed by English computer scientist Timothy Berners-Lee to enable information to be shared among internationally dispersed teams of researchers at the European Organization for Nuclear Research (CERN) facility near Geneva, Switzerland. Although the name World Wide Web would seem to indicate that the WWW is a network, it is not. The WWW is actually an application program which runs on individual computers and that creates connections to multiple different source computers over one or more networks. All WWW computer files are formatted using Hypertext Markup Language (HTML), and WWW communication among computers occurs using the Hypertext Transfer Protocol (HTTP). A computer file formatted in HTML is called a "web page" in WWW parlance.
In WWW parlance, connections established between different computers are termed "links." Users interact with computers running WWW software by utilizing application programs known as a WWW browser (e.g. Netscape Navigator). A WWW browser program allows a file formatted in HTML/HTTP format (i.e. "web pages") to be displayed on a computer screen as an agglomeration of text, images, sound, or other visual objects, which can appear as highlighted texts or graphics, and which are in actuality subprograms to establish communications links with other machines internetworked and running WWW software. The user can navigate through information by using a mouse and pointing and clicking on such visual objects on the screen, which will establish a link with another computer over the network and retrieve and display a file formatted in HTML by using the HTTP protocol. Thus, the innovation of the WWW was that the creation of HTML/HTTP formatted files allowed the display of information on a computer screen "as if" it were resident on one computer, while in actuality the information may be distributed in many different files on many different computers. It is important to remember that the HTTP/HTML scheme only refers to making internetworked computers speak the same language, and that actual network communication normally occurs over the Internet or other networks using standard network protocols, such as TCP/IP, or OSI protocols.
Thus, in order to effectively utilize the Internet one needs to both establish a connection over the Internet, and to specify a protocol whereby the computers know how to communicate. In accessing Internet services this is accomplished by use of what is known as the universal resource locator (URL). The URL has two basic components, the protocol to be used, and the object pathname. For example, the URL "http://www.uspto.gov" (home page for the U.S. Patent & Trademark Office) specifies a hypertext transfer protocol (HTTP) and a pathname of the server ("www.uspto.gov"). The server name is associated with a unique numeric value (TCP/IP address).
The foregoing concepts can be made more clear by reference to FIG. 1 which depicts a high-level schematic view of a private network interfacing with an external network. FIG. 1 is also a schematic diagram of how networked computers can connect through hosts to the Internet and then communicate to other computers through the Internet. The Internet 10 is shown as a network cloud. Internet host 12 is depicted as a mainframe computer. Internet host 12 functions as a network server for the network composed of client computers 20-22. Computer 20 functions as network server for the network composed of client computers 30-32. Computer 22 functions as network server for the network composed of client computers 33-35. Computer 31 functions as network server for the network composed of client computers 40-43. Computer 34 functions as network server for the network composed of computers 44-47. Computer 50 functions as a network server for the network composed of client computers 61-64. Computer 46 functions as a gateway between network servers 34 and 50. Furthermore, each network server just referenced functions as a gateway from its lower attached network into the network above it.
Internet host 12A is depicted as a mainframe computer. Internet host 12A functions as a network server for the network composed of client computers 20A-22A. Computer 20A functions as network server for the network composed of client computers 30A-32A. Computer 22A functions as network server for the network composed of client computers 33A-35A. Computer 32A functions as network server for the network composed of client computers 40A-43A. Computer 34A functions as network server for the network composed of computers 44A-47A. Furthermore, each network server just referenced functions as a gateway from its lower attached network into the network above it.
FIG. 1 can be used to illustrate the concepts discussed above. Assume personal computer 40A is running a web browser application program. Suppose that a personal computer 40A's user wants to access a "web page" (computer file formatted in Hypertext Markup Language (HTML)) via a "link" that appears in highlighted fashion on the personal computer 40A's screen. When personal computer 40A's user activates the "link" assume personal computer 40A specifies via a URL that the "web page" corresponding to the displayed "link" actually corresponds to a data file resident on computer 62 Thus, in order to access the desired "web page," personal computer 40A must use its private network protocol to negotiate with its server 32A and indicate that it desires delivery of information to Internet host computer 12A. Server 32A must then serve as a gateway and communicate to its network server 20A and negotiate with server 20A to pass personal computer 40A's information to Internet host 12A. Server 20A must then establish communication with Internet host 12A and then pass personal computer 20A's information to Internet host 12A.
Once Internet host 12A receives the URL from personal computer 20A, it must translate the human readable URL into Internet Protocol format, and then establish a connection with a second Internet host 12 and send the information. Internet host 12 then negotiates with network server 22 and subsequently passes the information. Network server 22 negotiates with network server 34 and subsequently passes the information. Network server 34 negotiates with network gateway 46 and subsequently passes the information. Network gateway 46 negotiates with network server 50 and subsequently passes the information. And network server 50 negotiates with a personal computer 62, and delivers the information to same. Once personal computer 62 has received the information, it recognizes it as a request for a specific HTML file resident within it, retrieves that file, and returns the requested HTML file ("web page") to requesting computer 40A throughout the network in a fashion similar to the one in which the request for the file was sent.
Each time that the networked computers communicated in the above scenario, the computers first had to establish communications links between themselves and pass the information. Each time the information was passed, the information had appended to it headers and trailers to ensure that it arrived at the correct location. Stallings W., Data and Comiuter Communications 245-278 (1985). Each communication requires both communications channel bandwidth consumption and computer processing overhead.
Although the above scenario only addresses one computer-computer communication, with WWW software many such connections are envisioned to be occurring simultaneously. Such simultaneous activity consumes considerable communications channel bandwidth and processing capacity.
The problems arising from such redundant data links have been recognized previously. In order to reduce the redundant links and unnecessary Internet work traffic, the previous solution has been to create local data caches on end client user machines, data caches on internetwork gateways, and data caches on the Internet hosts.
FIG. 2 illustrates the major components used to enact the previous solution. Shown is Internet host 12A to which are attached a first network with network server 20A and network members 30A-32A. Also shown attached to Internet host 12A is a second network with network server 22A and network members 33A-35A. In the previous solution, Internet host 12A creates an Internet host cache 12AC of Internet files which are frequently downloaded through it from the Internet. In addition, network server 22A also creates a cache 22AC of Internet files which are frequently downloaded through it. Furthermore, the application program, such as a web browser 33AB, which for sake of illustration is shown running on client computer 33A also creates a local cache 33AC of Internet files which the web browser has recently accessed.
In operation, a user (not shown) of the web browser application 33AB running on client computer 33A requests access to a specific data file 10F. The web browser first checks its local cache 33AC for the requested specific data file. If web browser 33AB finds such file is resident it retrieves the cached file and utilizes it in the manner requested by user (not shown). If the web browser 33AB either finds no such specific data file 10F resident in local cache 33AC (as is the case shown in FIG. 2) or finds such resident file is too old to be reliable, web browser 33AB requests specific data file 10F from Internet host 12A through network server 22A, which in this instance functions as a network gateway.
If the requested specific data file 10F is resident in Internet host local cache 12AC, then Internet host delivers a copy of the cached file to web browser 33AB through network server 22A, which in this instance functions as a network gateway. If the Internet host 12A, either finds no such specific data file 10F resident in Intrenet host cache 12AC or finds such resident file is too old to be reliable, Internet host establishes contact with the computer (not shown), via the Internet, wherein specific data file 10F is actually resident and retrieves the file in the fashion discussed above with respect to computer-computer communication via the Internet. Once Internet host 12A receives specific data file 10F it delivers such to web browser 33AB through gateway/network server 22A.
Each time an often requested specific data file transits through a network server/gateway, such as 22A, the network server creates a cached copy such as 22AC of the specific data file. Thereafter, specific data file requests through the network server/gateway 22A are first checked against this cache to determine if the specific data file is resident, much in the fashion described regarding when web browser 33AB requests a specific data file from Internet host 12A.
FIG. 3 shows a problem that exists within this prior art solution. At time t1 a user (not shown) of web browser 33AB requests specific data file 12F. Since specific data file is not resident in local cache 33AC, web browser 33AB then requests specific data file 12F from Internet host 12A. Internet host 12A consults Internet host cache 12AC, finds specific data file 12F.sub.t1 (where the subscript t1 indicates the version of specific data file 12F resident in Internet host case 12AC at time t1) resident and delivers a copy of the cached file to web browser 33AB. At time t1+10 seconds, another web browser 30AB, running on computer 30A, requests the same specific data file 12F, which again is not resident in local cache 30AC from Internet host 12A through network server/gateway 20A. However, upon receipt of this request and at this new time t1+10 seconds, Internet host 12A determines that the cached version of specific data file 12F (that cached version being 12F.sub.t1) is too old and therefore initiates communication, over the Internet, with computer 35 (upon which data file 12F is actually resident) and requests a new copy of file 12F from computer 35. Upon receipt of newly retrieved specific data file 12F.sub.t1+10 (where the subscript t1+10 indicates the version of specific data file received by Internet host 12A from computer 47 in response to the query made at time t1+10), Internet host 12A passes a copy to the file through network server/gateway 20A to web browser 30AB, and then caches the newly retrieved version of specific data file 12F.sub.t1+10. Furthermore, at this point network server/gateway 20A determines that specific data file 12F.sub.t1+10 is oft requested and thus decides to store an image of the file in network server cache 20AC (not shown).
The problem that exists within the prior art solution is now readily apparent. A newer version (specific data file 12F.sub.t1+10) is now locally resident in both Internet host cache 12AC and network serve cache 20AC, but web browser 33AB has in its cache the older version of the file (12F). Since it is very possible, and even likely, that the newly retrieved specific data file 12F.sub.t1+10 contained data different from earlier retrieved specific data file 12F.sub.t1 it is important that when such a newly retrieved specific data file 12F.sub.t1+10 becomes locally resident, such newly retrieved file contents be disseminated throughout the system to all clients who may be interested in such data. However, under the prior art solution this updated information is not supplied to the local caches of web browsers running on client computers.
In light of the foregoing, it is apparent that a need exists for a method and system which efficiently disseminates throughout application programs running on computers within one or more private networks the newest version of a specific file which has been downloaded into the one or more private networks from a source (such as the public Internet or other private networks) external to the one or more private networks.