A proxy server is a software application (sometimes embodied in a separate computer-based apparatus) located logically between a client application, such as a Web browser, and a content source (such as a Web server) that intercepts requests from the client to the content source to see if it can fulfill the requests itself. If not, the proxy server forwards the request to the content source. In general, proxy servers have two main purposes: to reduce latency and to filter requests.
For example, proxy servers can improve response times to fulfill client requests by offloading content from a content source and positioning that content closer (logically and, in some cases physically) to one or more users. To illustrate this behavior, consider the case where two users, A and B, access the World Wide Web (the graphical interface of the Internet) through a common proxy server. If user A requests a certain Web page, that request will pass through the proxy server. Assuming the proxy does not already store a copy of the requested Web page, it will forward the request to the applicable origin server. When the requested content is returned, the proxy will store a copy of the Web page before sending it on to user A. Later, when user B requests the same Web page, the proxy server will simply return the previously stored copy which it obtained while fetching the page for user A. If the proxy server is on the same network as user B, as is often the case, this will generally be a much faster operation than would be the case if the new request had to travel all the way to the origin server and back.
Proxy servers can also be used to filter requests, for example by enforcing access restrictions imposed by a network administrator. Often, corporate network administrators will configure their networks so that requests directed outside of the network (e.g., to internet Web sites or other resources) pass through one or more proxy servers where they are examined. If the request is made to a restricted site (e.g., as identified by its Web address), the request may be blocked. Alternatively, or in addition, if the content returned from a particular resource is deemed to be noncompliant with one or more network policies (e.g., because it is suspected of containing a computer virus or to be from a restricted site), the content may be blocked from entering the corporate network.
In general, the proxy server can be configured to enforce a variety of rules or policies established by a network administrator. However, sometimes proper enforcement of these rules is made difficult (or impossible) because the proxy does not have sufficient information regarding the context within which an associated request was made. For example, earlier it was noted that users often make requests for Web pages. Hence, network administrators would often like to set policies for allowing or not allowing access to such Web pages. But Web “pages” are not true pages or documents in the sense that most users think of such items. That is, Web “pages” are really not pages at all, but instead are actually made up of computer-readable instructions (usually written in the hypertext markup language, html, or a similar computer-readable language) that instruct a software application (typically a Web browser) how to display certain things (e.g., text, images, etc.). Those things, typically called “objects” are usually stored separate from the computer instructions that make up the Web page, sometimes even at different content sources. Such objects are retrieved separately from the computer-readable instructions in response to requests for same made by the Web browser as it executes those instructions. For complex Web pages (such as those found at news Web sites and the like), this may involve dozens of requests concerning multiple objects to populate a single Web “page”.
What this means then is that a single policy directed to a Web “page” really does not exist. Instead, policies have to be more fine-grained, often existing at the object level or directed to the identification of the content source providing the object. Even these policies, however, do not often work well because they cannot distinguish permissible instances of otherwise restricted content from impermissible instances thereof.
For example, take the case of an image that ordinarily would not be permitted within a corporate network according to one or more policies. It may be the case that the image is actually part of a Web page from an associated site (e.g., a news site) that is itself permitted to be viewed under the network policies. As the Web page loads in the requesting Web browser a request for the subject image is made. But because the proxy cannot recognize the context within which the request was made (i.e., that it is really a part of a presently loading Web page), that request (or the returned image) will be blocked.
Further examples of instances where traffic policy enforcement at a proxy is difficult exist. For example, there are a family of policies which require that the proxy add certain executable computer instructions to a Web document that is returned to a requesting Web browser so that the browser will take certain actions (e.g., block pop-up pages). But it would be inappropriate to add such instructions to other items, such as images, inasmuch as the instructions would be incompatible with those objects. Ordinarily the proxy can distinguish such objects on the basis of content type header information included in web documents, but this is not always the case. For example, certain JavaScript objects or cascading style sheets often purport to be html documents (for which the injection of executable computer instructions is appropriate) when in fact they are not (meaning the injection of such instructions is not appropriate). While the web browser that requested the items is capable of recognizing that the header information is wrong (and ignoring same), the proxy cannot make this assumption because it has no information concerning the context of the original request. As a result, the proxy may improperly inject the computer instructions into these items.
Accordingly, what is needed are improved techniques for facilitating network traffic filtering and other policy enforcement.