Internet traffic has increased significantly in recent years, and a need has arisen to develop scalable architecture for the delivery of Internet services. In place of a single centralised server handling all aspects of client requests, content providers are moving to a model of distributed, geographically diverse content distribution networks, in which a series of origin servers, or surrogates, are distributed within the network. Such architectures share the burden of client requests between several servers, and bring origin servers closer to the end users they are servicing. In addition to the load balancing advantages afforded by such distributed networks, the proximity of a given surrogate to the end user allows a content provider to offer additional user services which might be based for example on geographic location.
The Internet Content Adaptation Protocol (ICAP) is a light weight, HTTP-like protocol specified in Request for Comments (RFC) 3507 and used to extend transparent proxy servers. ICAP enhances distributed content delivery networks by providing transformation services on HTTP messages. The ICAP server receives an HTTP message from a client, which may for example be a surrogate origin server, the ICAP server then performs some transformation of the HTTP message before sending back a response to the client, usually with a modified HTTP message.
ICAP enables simple content transformations to be performed near the edge of the network, instead of requiring updated content from an origin server. This may include for example providing an updated advertisement each time a particular web page is viewed. Instead of requiring an updated page from an origin server each time a page is viewed, a page may be cached near the edge of the network, and an ICAP server used to insert a new advertisement each time the page is viewed. Other edge services may be offered by ICAP servers, including translation of web pages into an appropriate language for the geographical location of an end user, or transformation of web pages to different formats appropriate for particular end user devices, including for example tablet or mobile phone based web browsers.
ICAP servers may additionally enable the offloading of expensive operations from surrogates, so reducing the processing load at the surrogate. Virus scanning is one example of an expensive process that may be offloaded in this way. A surrogate responding to a download request from a user may offload the task of scanning a downloaded program for viruses before accepting the program into its cache. Content filtering may also be implemented via ICAP, with firewalls or surrogates sending outgoing webpage requests to an ICAP server to check that the Uniform Resource Identifier (URI) in the request is allowed before delivering the webpage to an end user.
In order to provide the above discussed functionality, ICAP servers may operate in different modes according to the nature of the task being conducted. In Request Modification (REQMOD) mode, an end user request is first sent to the ICAP for modification before it is forwarded to an origin server or otherwise processed. REQMOD may be used to implement content filtering, with a URI being sent to the ICAP server to determine if access to the URI is allowed. The ICAP server may return the request to the client if the URI is allowed, or return a modified request pointing to a page containing an error message if the URI is not allowed.
In Response Modification (RESPMOD) mode, an ICAP client sends an HTTP response, typically generated by an origin server, to an ICAP server before the response is forwarded to the end user. The ICAP server may modify the response before returning it to the ICAP client, for example conducting virus scanning, content transition or content formatting.
In OPTIONS mode an ICAP server provides configuration information to an ICAP client, enabling the ICAP client to interact with the ICAP server.
In some cases, requests sent by a client to an ICAP server in either RESPMOD or REQMOD may include a preview of the HTTP message for transformation. The preview enables an ICAP server to see the beginning of a new transaction and decide whether to opt out of the transaction before receiving the remainder of the message. Appropriate use of previewing can yield significant performance improvements by avoiding unnecessary transaction load between ICAP clients and servers. However, previewing all messages would introduce a significant additional message load, so ICAP servers may specify to ICAP clients which messages should be previewed to the ICAP server. This specification is based upon the file extension of the Uniform Resource Locator (URL) included in the message to be sent. File extensions that should be previewed are communicated to ICAP clients via a message exchange with the ICAP server operating in OPTIONS mode. ICAP clients may then assess future messages for transmission to the ICAP server to determine if a preview should be sent, based on the file extension of the URL in the message and in accordance with the specification received from the ICAP server.
While the above system for implementation of previewing in ICAP has been effective in the past, the majority of web pages are now dynamic, meaning most URLs do not include a file extension. The advantages that may be gained from effective use of previewing ICAP messages are therefore lost, as ICAP clients have no information on which to determine whether or not a preview should be sent to the ICAP server.