In general, a server may be configured to provide information to one or more clients according to a client/server model of information delivery. In this model, the server is a storage system that typically contains one or more mass storage devices, such as magnetic hard disks, in which information may be stored and retrieved as desired. The server is usually deployed over a computer network comprising a geographically distributed collection of interconnected communication links, such as Ethernet, optical or wireless links, that allow the clients to remotely access the server's stored information. The clients may include network devices or computers that are directly or indirectly attached to the server, e.g., via point-to-point links, shared local area networks (LAN), wide area networks (WAN) or virtual private networks (VPN) implemented over a public network such as the Internet. Yet other clients may include software applications executing on computers that are configured to communicate with the server.
In some client/server arrangements, the server may be configured as a network cache that buffers previously-accessed or frequently-accessed client information. As such, the server provides a set of clients with faster access to the buffered information than if they were to access the same information directly from the origin servers that normally serve the information. For instance, the set of clients may be physically situated closer to the network cache than to the origin servers, or the clients may be able to access the cache over a lower latency (or higher bandwidth) data path, etc. The network cache's buffered information is typically in the form of files which are made accessible to the set of clients. As used herein, a file is any collection of data that is identifiable by a common name, such as a uniform resource locator (URL), and therefore may include conventional files, HyperText Mark-up Language (HTML) files (“web pages”) or other data objects.
In practice, a network cache can be configured to operate as a “reverse proxy” or “forward proxy” cache. A reverse-proxy cache is a server that stores a selected set of information from one or more origin servers. For example, a multimedia company may copy selected streaming audio or video content from its origin servers to a reverse-proxy cache, which is then used as an “accelerator” for providing the selected content to clients.
In contrast, a forward-proxy cache is a server that buffers network data for a particular set of clients. Accordingly, unlike the reverse-proxy cache, the forward-proxy cache does not necessarily store selected data from specific origin servers and instead may store data from a variety of different origin servers, i.e., based on the network traffic patterns of the cache's particular set of clients.
Clients typically communicate with a network cache by exchanging discrete packets of data formatted according to predefined file-access protocols, such as the HyperText Transfer Protocol (HTTP), Network File System (NFS) protocol, Common Internet File System (CIFS) protocol, File Transfer Protocol (FTP), etc. A client may issue a file-access request that specifies, among other things, a specific file to access and a particular file operation to perform. The network cache receives the client request, processes the request, and when appropriate returns a response. For example, the client may issue a file “read” request to the cache, and, in response, the cache may return a file-access response containing the client's requested file.
Often, the file-access requests and responses that are exchanged between a network cache and its clients include one or more packet headers, such as Multipurpose Internet Mail Extensions (MIME) headers, containing file content and disposition information. For instance, a client's file-access request or response may include MIME headers that specify the type of content requested, a type of content-transfer-encoding, a uniform resource identifier (URI) or uniform resource locator (URL) identifying a particular requested file, and so forth. MIME headers and their uses are generally described in more detail in the Request For Comments (RFC) 2045 entitled Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies, by N. Freed et al., published November 1996, which is available through the Internet Engineering Task Force (IETF) and is hereby incorporated by reference as though fully set forth herein.
It is often desirable to scan client-requested files for viruses or other illicit content before the files may be returned to their requesting clients. In the event that a virus is located in a requested file, the client may be notified that the file is not currently accessible.
Alternatively, the file may be “cleaned” in order to remove the virus before the file is returned to the client. As used herein, a non-viral file is defined as a file that does not contain executable code and thus is incapable of containing a virus.
The Internet Content Adaptation Protocol (ICAP) provides a mechanism for transforming (or “adapting”) clients' file-access requests and/or responses according to a predetermined set of policies or rules, e.g., selected by a system administrator. Accordingly, the ICAP protocol may be configured to scan client-requested files for viruses and transform file-access responses containing those files in which viruses have been detected. Besides virus scanning of client-requested files, the ICAP protocol also may be used to perform other types of object-based content vectoring, as described in more detail in RFC 3507 entitled Internet Content Adaptation Protocol (ICAP), by J. Elson et al., published April 2003, which is hereby incorporated by reference as though fully set forth herein.
A client may send a file-access request to a network cache which processes that request and prepares a corresponding response. Before returning the response to the requesting client, the cache first may forward the response to an ICAP server for virus scanning and/or other processing. If the ICAP server determines that the response includes a non-viral client-requested file, the ICAP server returns the response to the cache without modification. Then, the cache forwards the non-modified response to the requesting client. On the other hand, if the ICAP server identifies a virus in the client-requested file, the ICAP server may modify the client-requested file so as to remove the virus or may modify the response to indicate that the requested file is not presently available. In either case, the modified response is returned to the network cache, which then forwards the modified response to the requesting client.
According to the conventional ICAP arrangement, the network cache or origin server sends every file-access response to the ICAP server for a virus scan before the response may be returned to its requesting client. Although effective, this conventional arrangement suffers various disadvantages. For instance, not only does a requesting client have to wait for the network cache or origin server to retrieve the client's requested file and return an appropriate response, but the client also has to wait for the response to be sent to an ICAP server, processed by the ICAP server and returned to the network cache or origin server. The added latency that the client experiences due to the ICAP processing is generally undesirable and in some cases may negatively affect the client's functionality. Further, the frequent communications between the network cache or origin server and the ICAP server may consume an excessive amount of network bandwidth that otherwise could be used for higher-priority network traffic. In addition, when the ICAP server is coupled to a relatively large number of network caches and origin servers, the ICAP server may have to perform an exorbitant amount of virus scans which, in turn, may exhaust the server's processing resources, such as its available memory and processing bandwidth.
One technique for reducing the number of virus scans performed at the ICAP server requires the network cache or origin server to determine whether a file-access response contains a file whose filename corresponds to a non-viral file type. If so, the cache or origin server identifies the file as non-viral and returns the response to its requesting client without first having to perform conventional ICAP processing. To that end, the response may be analyzed to determine whether it contains a MIME header specifying a filename with a file extension corresponding to a non-viral file type. For example, suppose the response includes a MIME header that identifies a client-requested file having a filename “foobar.jpg.” In this case, the extension “.jpg” indicates that the file is a non-viral image file formatted according to the Joint Picture Expert's Group (JPEG). Accordingly, the response containing foobar.jpg may be returned directly to the requesting client without first having to virus scan the file at the ICAP server.
A problem with the above-noted technique for reducing the amount of ICAP processing is that it may be easily circumvented by improperly named files, i.e., files whose filenames do not match their actual file types. For instance, a file having a filename foobar.jpg actually may be an executable file even though its filename says otherwise. Consequently, the filename may misidentify the file as non-viral when in fact it contains an executable virus. In this situation, the misnamed file may improperly forgo conventional ICAP virus scanning when such a scan would detect the file's virus.
It is therefore generally desirable to provide a more reliable technique for reducing the number of file-access responses that are sent to the ICAP server for processing before the responses may be forwarded to their requesting clients.