1. Field of Invention
The present invention relates to systems and methods pattern recognition and, more particularly, to systems and methods for analyzing patterns of data in communications.
2. Background of Invention
The predominant model of data communications today is the use of individual packets or frames of data that are routed individually through a network from a source to a destination. This type of service is used in many computer networks. Each packet is comprised a number of layers of protocol headers and data, for one or more network protocols. For each network protocol, the protocol headers and data are generally defined by some number of fixed or variable length fields, each field having predefined value(s). Another way of describing a network protocol is to say that the protocol defines an ordered series of elements, each element having a offset from the beginning of the packet and a data value. Packets conforming to the network protocol must have elements that satisfy the defined data values at their respective offsets. The term "packet" is used herein to described any type of data communication unit that is defined according to a network protocol, including conventional packets, frames (e.g. Ethernet, Token Ring, or FDDI), cells (e.g. ATM), and the like.
It is helpful for network operation, such as traffic analysis, to capture and inspect packets as they travel through a particular location on the network. Inspection is done in order to determine the quantities, distributions, or the like of various types of packets (i.e. what protocols are used), sources, destinations, and so on.
Identification of packets is typically done by simple pattern matching between a pattern or filter defined by the network protocol for the aspect of the packet to be matched, and the relevant portion of the packet being inspected. For example, in a typical local area network (LAN), the traffic may consist of several different types of protocols, such as FTP running on top of IP, Telnet, NFS etc. A LAN protocol analyzer is conventionally used to capture and inspect these packets. However, rather than inspecting all the packets, a system administrator may be interested in, inspecting (e.g. counting) for example, only the FTP packets from a particular Destination IP address.
In a typical protocol analyzer, the pattern matching is done by comparing stored data for network protocols defining an FTP packet including an IP address with the captured data from the network. If there are several pattern matching criteria, as in this example, a pattern for each of these is applied to the captured packet data. This conventional method implies that for multiple (say N) patterns, the data packet has to be scanned N times and compared with the pattern each time. The time required to do these comparison increases proportionally with the number of patterns to be matched (.varies. N ). The space (number of bytes) required to store the pattern data also increases directly proportional to the number of filters (.varies. N ).
This conventional pattern matching process is very slow and time consuming because it requires many multiple comparisons. As a result, packet inspection cannot track/analyze every packet being transmitting in very high speed (e.g. 100 Mbps) networks, resulting inaccurate analysis of network traffic. Alternatively, to ensure proper analysis, network speeds are limited by the operational speed of the protocol analyzer.
This problem of identification of packets is not limited to inspection for traffic analysis, but also applicable to many other areas of network communication, such as packet assembly and disassembly, routing, and the like. In each of these areas, an analysis of a unit of data must be made to determine whether it matches one or more predetermined patterns, and then appropriate actions are taken. Conventional pattern matching approaches, as outlined above, are thus a significant factor in limiting the speeds at which network communications may operate.
Accordingly, it is desirable to provide a method and system of pattern recognition for data communications that operates at high speed, and is sufficiently flexible and generalized to provide for analysis of a large variety pattern recognition operations and implementations.