This invention relates to selecting a cache that stores information received from a network site.
Computer networks such as the Internet provide users with a powerful tool for acquiring and distributing information. Since the emergence of the World Wide Web in the early 1990s, users have flocked to the Internet in growing numbers. The corresponding increase in network traffic, however, has increased the length of time users must wait to receive information. During busy periods, users commonly wait several minutes for complex Web-pages to load.
Many computers on the World Wide Web communicate using HTTP (HyperText Transfer Protocol). HTTP defines a client/server relationship between clients requesting resources (e.g., HTML (HyperText Markup Language) documents, audio, video, graphics, executable or interpreted instructions, and other information) and servers offering those resources. As shown in FIG. 1, a client 100 transmits a request for a resource 104 to a server 102 providing the resource 104. The server then transmits a response that can include the requested resource 104 along with other information such as any errors that may have occurred. Software running on the client 100 (e.g., a browser) can present the retrieved resource 104 to the user.
As shown in FIG. 2, an HTTP request 106 includes a URI (Universal Resource Identifier) 108 (e.g., a URL (Universal Resource Locator)) that identifies a requested resource 104 within a hierarchical location scheme. That is, the URI 108 describes a resource with increasing specificity, for example, first by identifying the domain 116 (e.g., www.domain.com) providing the requested resource 104, then by identifying the one or more directories 117 (e.g., xe2x80x9c/directory/subdirectoryxe2x80x9d) within the domain 116, and finally by identifying a file 118 (e.g., xe2x80x9cfilename.htmlxe2x80x9d) within the identified set of directories 117.
The HTTP request 106 also can include other information such as the type of client 110 making the request (e.g., a Microsoft(copyright) Internet Explorer browser), the preferred language of a user 112, and other information 114. A request 106 can vary in size from a few bytes to several kilobytes.
The exchange shown in FIG. 1 is a simplification of network communication. In fact, a request typically passes through many intermediate agents before reaching a server 102. One type of intermediate agent is a proxy 120. As shown in FIG. 3, a proxy 120 receives requests from a client 100 and optionally sends them on to the server 102 providing a requested resource. The proxy 120 receives the server""s response 108 and can send the response 108 on to the client 100. The proxy 120 can perform many functions in addition to acting as a conduit for client 100/server 102 communication. For example, by examining information stored in requests and/or responses, the proxy 120 can act as a filter, for example, by intercepting mature content before it reaches a client 100 used by a child.
As shown in FIG. 4, many different users often request the same resource (e.g., Web-page). Thus, storing commonly requested resources in a cache 126 can reduce the amount of time it takes to provide a response to a request. As shown, a cache database table 128 stores client requests 130 and previously received server responses 132 to these requests 130. The table 128 also can store an expiration date 134 for a stored response 132 and other information 136. Different cache functions handle storage and retrieval of information from the cache.
As shown in FIG. 5, a proxy 120 (e.g., a proxy at an ISP (Internet Service Provider)) initially receiving a request can forward the request to a cache proxy 124 that includes a cache database 126 and instructions that implement cache functions 125. These can functions 125 search, read, and write the cache database 126. When the cache proxy 124 receives a request, the cache proxy 124 searches the cache database 126 for a corresponding response.
Referring to FIG. 6, if a response corresponding to the request previously has been stored in the cache 124, the cache proxy 124 can return the response without accessing the server 102 from which the requested resource originally came. Eliminating transmission of the request from the proxy 120 to the server 102 and the corresponding transmission of a response from the server 102 back to the proxy 120 reduces client 100 access times and network traffic.
As shown in FIG. 7, if the cache 126 does not store a previous response to a request, the cache proxy 124 transmits a request to the server 102. Alternatively, the cache proxy 124 can transmit a request to the server 102 if the request includes a xe2x80x9cpragma=no-cachexe2x80x9d directive indicating that the response provided should not be retrieved from a cache. Regardless of whether a cache search failed or a request included a xe2x80x9cpragma=no-cachexe2x80x9d directive, the cache proxy 124 may store the response provided by the server 106 for future use.
As shown in FIG. 8, a proxy 120 may access multiple cache proxies 124, 138, 140, for example, cache proxies collected within the same ISP 122. This capability enables a single proxy 120 to access a very large number of cached responses. The proxy 120 routes a request received from a client to one of the cache proxies 124, 138, 140 by hashing (e.g., transforming information into a number) the domain 116 included in the URI 108 of the request. For example, hashing a domain of xe2x80x9cwww.a.comxe2x80x9d may yield a xe2x80x9c1xe2x80x9d while hashing a domain of xe2x80x9cwww.b.comxe2x80x9d may yield a xe2x80x9c2.xe2x80x9d These requests can be sent to cache proxy 124 and 138, respectively. This scheme collects the different resources provided by the same domain into the same cache proxy. For example, xe2x80x9cwww.a.com/a.htmlxe2x80x9d will share the same domain and reside on the same cache 124.
As described above, a cache proxy 124, 138, 140 may not previously have cached a response corresponding to a particular request. In such a case, the cache proxy 124 transmits the request to the server providing a particular resource. For example, as shown, a request for xe2x80x9cwww.c.com/cxe2x80x9d is routed to cache proxy #2 140 based on the request""s URI domain information (xe2x80x9cwww.c.comxe2x80x9d). The cache proxy 140, however, must transmit the request to the server 102 providing the resource since the cache does not yet store xe2x80x9cwww.c.com/c.xe2x80x9d Upon receipt of the response, the cache proxy 140 can store xe2x80x9cwww.c.com/cxe2x80x9d in its cache for future use.
To summarize, as shown in FIG. 9, a proxy 120 using multiple cache proxies receives a request 142 and performs 144 a hash function on the domain information included in the URI of the request. Based on the hash results, the proxy 120 transmits 146 the request to one of the cache proxies 124, 138, and 140.
The cache proxy 124, 138, 140 receiving 148 the request can determine whether to search its cache 150. If the cache proxy searches 160 and finds 162 a response corresponding to the request in its cache, the cache proxy 124, 138, 140 can return 164 the found response to the proxy 120. If the cache proxy decided 150 not to search its cache or failed 162 in its search for the request, the cache proxy sends 166 the request on to the server identified by the request URI. After the cache proxy receives the response, the cache proxy can determine 168 whether to store 170 the response in its cache to speed future requests. The cache proxy then returns 172 the received response to the proxy 120 for transmission to the client making the request.
The present inventors recognized that the method of distributing responses among caches described above can result in a distribution that underutilizes the caches.
In general, in one aspect, a method of selecting one of a plurality of caches that store information received from at least one network site includes receiving information that identifies the location of a resource within a domain and selecting a cache based on the information that identifies the location of the resource within the domain.
Embodiments may include one or more of the following features. Receiving information may include receiving a request such as an HTTP (HyperText Transfer Protocol) request. The information may include a URI (Universal Resource Identifier) (e.g., a URL (Universal Resource Locator)) identifying the location of a resource. Selecting a cache may be based on the domain of the resource in addition to the location of a resource within the domain. Selecting a cache may include use of a hashing function. Selecting a cache may include selecting a cache proxy. The method may also include sending a request to the selected cache proxy.
The information identifying the location of a resource within a domain can include one or more directories and/or a file name.
In general, in another aspect, a method of selecting one of a plurality of caches that store information received from a network site includes receiving information that identifies a location of a resource expressed using a hierarchical location scheme that includes identifiers corresponding to different hierarchical levels, and selecting a cache based on information identifiers that correspond to more than one hierarchical level.
Embodiments may include one or more of the following features. A hierarchical level may be a domain. A hierarchical level may be the location of a resource within a domain.
In general, in another aspect, a method of selecting one of a plurality of caches that store information received from a network site includes receiving information that identifies a location of a resource; and selecting a cache based on all the received information identifying the location of the resource.
In general, in another aspect, a method of selecting one of a plurality of caches that store information received from at least one network site includes receiving an HTTP (HyperText Transfer Protocol) request that includes a URI (Universal Resource Identifier) identifies the location of a resource within a domain and selecting a cache proxy by hashing the URI domain and URI information that identifies the location of the resource within the domain. The method further includes sending a request to the selected cache proxy.
In general, in another aspect, a computer program product, disposed on a computer readable medium, for selecting one of a plurality of caches that store information received from at least one network site, includes instructions for causing a processor to receive information that identifies the location of a resource within a domain, and select a cache based on the information that identifies the location of the resource within the domain.
In general, in another aspect, a system for handling requests for information provided by a network server includes a plurality of cache proxies and a front-end proxy. The front-end proxy includes instructions for causing the front-end proxy processor to receive information that identifies the location of a resource within a domain, and select a cache based on the information that identifies the location of the resource within the domain.
Advantages may include one or more of the following. Performing a hash that includes the resource information of a URI spreads storage of resources provided by a particular domain across multiple caches. Because a handful of domains receive the lion""s share of requests (e.g., xe2x80x9cwww.aol.comxe2x80x9d), spreading the resources provided by these domains over multiple caches enables more efficient use of the caches as each cache reads and writes a substantially equal number of requests and responses. Thus, no one cache becomes overloaded with request processing while other caches remain underutilized.
Modifying the instructions of a proxy instead of modifying the instructions executed by cache proxies reduces the difficulty of incorporating these techniques into an existing network configuration.