Transparent network proxies aim to operate in a manner invisible to clients and/or servers, while providing benefit to client, server, an intermediate entity, or some combination thereof, for example to improve user experience or reduce upstream bandwidth. Servers in such networks generally are identified by stable and often mnemonic names, but traffic is routed by numeric allocated addresses. Clients and intermediate nodes can locate servers by resolving names to addresses using a “name resolution service”. An example of such a service is the pervasive Domain Name Service (DNS), which can be employed to resolve Internet domain names to Internet Protocol version 4 (IPv4) and Internet Protocol version 6 (IPv6) addresses.
The mapping of name to address is not necessarily a one-to-one mapping. A given name may be serviced by multiple addresses, or conversely a single address may service many names. To be able to service many names from a single address, protocols may include name information in protocol messages. The Hypertext Transfer Protocol (HTTP) is an example of such a protocol in which a single IP address and service may provide content for a wide variety of service names. To facilitate this function, HTTP requests may include information about a domain name that is associated with the issued HTTP request.
When a message is transparently intercepted, the transmitting client does not know that the intermediary device is participating in the handling of the protocol message. In this scenario, the client performs its own name resolution to determine the remote service address. When the intermediary device views a client message which is a request for a network resource (an object), the intermediary knows from application-level information what domain name the client is requesting resources from. The proxy also knows what address the client believes this name resolves to. However, the intermediary may not be able to audit the client name resolution in progress, and thus the proxy cannot determine that this client resolution is trustworthy. For example, a client could intentionally construct a request with a false implied name resolution to confuse the proxy, or the client might be subject to a name resolution poisoning attack where a malicious intermediary agent provides a non-authoritative resolution to the client. Because the intermediary can only infer the name to address relationship from the protocol message, this type of mapping is termed the “implied client-resolved supplier address.”
Caching intermediaries generally desire to cache resources by name rather than addresses for a variety of reasons: a single name might be resolved to many addresses; addresses are generally much less stable than names; and other, possibly protocol-specific reasons. Caching solutions generally tend to pick the most global and most stable naming for an object possible without loss of specificity.
Given that objects are obtained by address but often cached by name, proxies form an association between the addresses used to fetch objects and the names used for caching. If this association is made by simply trusting the implied client-resolved supplier address, the proxy is open to cache poisoning attacks, where clients forge requests to coerce the proxy to cache content from a non-authoritative source. For example, a request could be made for the front page image on a popular news site using the address for a personal server hosting obscene content. This type of attack would potentially cause other users using the same proxy to retrieve this poisoned content, seemingly from the official host. The onus is on the caching intermediary to ensure that the cached content served to clients meets a set of security, correctness, and constraint parameters.
One conventional approach to name resolution is a forced proxy resolution, which re-resolves the name in a request using a trusted resolution source. This means that the upstream source is always trusted, avoiding the potential of cache object poisoning as discussed above. The cost of name resolution for all proxied requests is potentially very high, not just in terms of client response time. The resolution requirements add complexity to proxy deployments and external resources which must be carefully managed. Failure in these external services degrades or interrupts service for clients.
Another common approach to name resolution is parallel caching, which caches a copy of the object for each address employed to fetch the object upstream, and simply employs the implied client-resolved supplier address. This avoids the intermediate cost and dependencies. Since each client receives content only according to their own resolution, then cross-client poisoning attacks using the intermediary as the vector are not possible. A potentially negative consequence here is that multiple identical copies of a given object may need to be cached, resulting in reduced beneficial caching and increased storage needs.
Therefore, it is desirable to design a caching network intermediary device that addresses cache poisoning attacks effectively and efficiently.