In a client-server environment, packets are exchanged between a server computer and one or more client computers. In a HTTP environment, for example, a HTTP server typically exchanges HTTP packets with one or more HTTP clients. HTTP technology is quite well established and will not be explained in details herein.
From time to time, there may exist a need to parse the HTTP packets to obtain the information encapsulated by the packets. For example, applications such as scanning to support malware detection (e.g., virus or adware) and/or content filtering to support business rule implementation often parse HTTP packets to obtain the content (e.g., payload) of the packet. The content may then be scanned and/or filtered to detect the possible presence of malware, for example.
FIG. 1 shows a view of a portion of a data stream from the HTTP 1.1 (RFC 2616) perspective. Generally speaking, there are two types of HTTP responses: a HEAD response and a GET response. From the point of view of HTTP, a HEAD response returns only the HTTP header while a GET response returns both the HTTP header (102 and 106) and the HTTP content (104 and 108). For certain applications, the content is of primary interest. Accordingly, the ability to distinguish between a GET response and a HEAD response is as relevant content exists in the HEAD response but not in the GET response.
Boundary determination is also an important issue to resolve. A boundary marks the termination of a given file and the start of a new file. Accurate and efficient boundary determination allows the application to accurately and efficiently obtain the content of a given file, for example.
In the prior art, HTTP parsing is performed on both client-transmitted packets (e.g., the client requests) and server transmitted packets (e.g., the server responses) in order to accomplish boundary determination. This is because the typical HTTP parser only sees the HTTP header and HTTP content if a content exists. The need to parse both client-transmitted packets and server-transmitted packets disadvantageously imposes a heavy processing load on the system's CPU and/or memory resources, leading to degraded system performance. System performance is further degraded if state-based parsing is employed to parse the client-transmitted packets and the server-transmitted packets since state machines (such as state machine 302 illustrated in the example of FIG. 3) tend to be resource-intensive to execute.