This invention generally relates to data processing in the field of networks. The present invention relates more specifically to the aggregation of information about message flows.
In computer networks, it is desirable to collect information about how a network is used. The information can be used by network administrators, routing devices, service providers, and users. This information may describe how network messages or packets are transmitted in the networkxe2x80x94their source or destination, number, frequency, size, protocol type, priority, or other administrative information such as security classifications or accounting information. This information may be aggregated by a variety of categoriesxe2x80x94for the entire network or subnetworks thereof, for groups of sources or destinations, or for particular types of packets (such as particular size, protocol type, priority, security classifications, or accounting information). A stream of packets passing through the network is known as a xe2x80x9cflow.xe2x80x9d
However, in many computer networks, the number of packets transmitted in the network, is large, and thus the amount of information to be collected is extremely large. Often, the resources needed to process this information, such as static storage and processor power, are much larger than are available or practical.
A first known method for collecting information about use of the network is to couple a monitoring processor to a link in the network, and to monitor traffic which passes through that link. For example, the monitoring processor could be coupled to a local-area network (LAN) or coupled to a router, and could monitor traffic input to or output from that router using that LAN. A protocol known as xe2x80x9cRMONxe2x80x9d (remote monitoring) is known for transmitting messages relating to monitoring information between the monitoring processor and the router. However, this known method is subject to several significant drawbacks. For example, the number of packets input to and output from the router usually greatly exceeds the capability of the monitoring processor to collect and process information about packets. Also, that the monitoring processor may be able to collect and process information only about packets which pass through that particular link.
A second known method for collecting information about use of the network is to couple the monitoring processor to the router using protocols at layer 3 of the OSI model, such as using the Internet Protocol (xe2x80x9cIPxe2x80x9d) protocol to communicate between the monitoring processor and the router. The RMON protocol may also be used to transmit messages relating to monitoring information between the monitoring processor and the router in this configuration. However, this second method also has drawbacks. For example, the monitoring processor may be unable to collect information from the router in sufficient detail, or if information is available in sufficient detail, that information may greatly exceed the capability of the monitoring processor to collect and process it.
In a third known method, a router provides the aggregated information to one or more filters at an output port. Each filter selects only a subset of the total set of flows. The filters may be combined to create compound filters and may be coupled to aggregators, which further aggregate flow data and may store flow data for use by application programs. The filters may select information using a variety of criteria, including: (1) ranges of addresses for source and destination: (2) information about packets in the flow, such as the number and frequency of the packets in the flow, the size of the packets in the flow (total size and distribution): (3) the protocol used for the flow, such as for example whether the flow uses an electronic mail protocol, a file transfer protocol, a hypertext transfer protocol (xe2x80x9cHTTPxe2x80x9d), a real-time audiovisual data transmission protocol, or some other protocol: (4) other administrative criteria which may be pertinent to the flow, such as for example the time of initiation or duration of the flow. However, even in the third method, the quantity of information generated may exceed greatly the resources available to handle it. In addition, much of the information captured may be incomplete, have little informational value, or may not be captured at all.
For example, in a network that conforms to Internet protocols, a request for data may be sent using HTTP from a source device A at port 2000, to a destination device B at port 80, the well known port for receiving HTTP requests. Often, but not always, a host receiving an HTTP request at port 80 responds by transmitting data from port 80 to the requestor. However, to reduce contention for port 80, a host may employ xe2x80x9cport switching,xe2x80x9d and thus may respond from a different port.
In this example, assume that device B sends the requested data to device A at port 2000, but sends from port 2999 instead of port 80. To capture HTTP traffic related to device A, a filter on the router has been configured to capture and aggregate traffic from source device A to a destination device at port 80, and from source device B from source port 80. Thus, the filter fails to capture the response from device B from port 2999. While the filter may be configured to capture traffic between an expanded set of ports that includes port 2999, the resulting additional data captured may not be necessarily related to HTTP traffic, or may be even too large in quantity to be handled by available resources.
Thus, there is a need for methods, mechanisms, or systems whereby the vast amount of flow data produced by network elements may be condensed, organized and made useful.
Accordingly, it would be desirable to provide a method and system for monitoring information about network usage, while avoiding overwhelming the limited resources available to process and store the information.
There is a particular need for a mechanism of aggregating information about related network traffic at a sufficient level of detail.
The foregoing needs, and other needs that will become apparent from the following description, are achieved by the present invention, in one aspect, through the aggregation of related flow records. Specifically, flow records may be organized according to whether they are request records, response records associated with the request records, or flow records associated with neither category. A request record may represent a network flow to a particular device of a particular network flow type, for example, network flow to a standard port. The request records and flow records are then be aggregated.