1. Field of the Invention
The present invention relates generally to the field of computer networks, and more specifically, to a system for performing pattern matching on network messages at high speed.
2. Background Information
Enterprises, including businesses, governments and educational institutions, rely on computer networks to share and exchange information. A computer network typically comprises a plurality of entities interconnected by a communications media. An entity may consist of any device, such as a host or end station, that sources (i.e., transmits) and/or receives network messages over the communications media. A common type of computer network is a local area network (“LAN”) which typically refers to a privately owned network within a single building or campus. In many instances, several LANs may be interconnected by point-to-point links, microwave transceivers, satellite hook-ups, etc. to form a wide area network (“WAN”) or subnet that may span an entire city, country or continent. One or more intermediate network devices are often used to couple LANs together and allow the corresponding entities to exchange information. A bridge, for example, may be used to provide a “bridging” function between two or more LANs. Alternatively, a switch may be utilized to provide a “switching” function for transferring information between a plurality of LANs at higher speed.
Typically, the bridge or switch is a computer that includes a plurality of ports, which may be coupled to the LANs. The switching function includes receiving data at a source port that originated from a sending entity, and transferring that data to at least one destination port for forwarding to a receiving entity. Conventional bridges and switches operate at the data link layer (i.e., Layer 2) of the communications protocol stack utilized by the network, such as the Transmission Control Protocol/Internet Protocol (TCP/IP) Reference Model.
Another intermediate network device is called a router. A router is often used to interconnect LANs executing different LAN standards and/or to provide higher level functionality than bridges or switches. To perform these tasks, a router, which also is a computer having a plurality of ports, typically examines the destination address and source address of messages passing through the router. Routers typically operate at the network layer (i.e., Layer 3) of the communications protocol stack utilized by the network, such as the Internet Protocol (IP). Furthermore, if the LAN standards associated with the source entity and the destination entity are different (e.g., Ethernet versus Token Ring), the router may also re-write (e.g., alter the format of) the packet so that it may be received by the destination entity. Routers also execute one or more routing protocols or algorithms, which are used to determine the paths along which network messages are sent.
Traffic Management
Computer networks are frequently being used to carry traffic supporting a diverse range of applications, such as file transfer, electronic mail, World Wide Web (WWW) and Internet applications, voice over IP (VoIP) and video applications, as well as traffic associated with mission-critical and other enterprise-specific applications. Accordingly, network managers are seeking ways to identify specific traffic flows within their networks so that more important traffic (e.g., traffic associated with mission-critical applications) can be identified and given higher priority to the network's resources as compared with other less critical traffic (such as file transfers and email). In addition, as computer networks get larger, there is also a need to balance the load going to various servers, such as web-servers, electronic mail servers, database servers and firewalls, so that no single device is overwhelmed by a burst in requests. Popular Web sites, for example, typically employ multiple web servers in a load-balancing scheme. If one server starts to get swamped, requests are forwarded to another server with available capacity.
Layer 4 switches or routers have been specifically developed to perform such services. In a Layer 4 switch, the device examines both the network and transport layer headers of network messages to identify the flow to which the messages belong. Such flows are often identified by examining five network/transport layer parameters (i.e., IP source address, IP destination address, source port, destination port and transport layer protocol). By examining these five parameters, a Layer 4 switch can often identify the specific entities that are communicating and the particular upper layer (e.g., Layer 7) application being used by those entities. In particular, a defined set of well-known port numbers has been established at Request for Comments (RFC) 1700 for certain common applications. For example, port number 80 corresponds to the hypertext transport protocol (HTTP), which is commonly used with WWW applications, while port number 21 corresponds to the file transfer protocol (FTP).
The parsing of data packets so as to identify these network/transport layer parameters is typically performed in software by a dedicated module or library. The Inter-network Operating System (IOS®) from Cisco Systems, Inc. of San Jose, Calif., for example, includes software modules or libraries for performing such packet parsing functions. A processor, such as a central processing unit (CPU), at the network device executes the corresponding program instructions. These modules or libraries may be written in any number of well-known programming languages. The Perl programming language, in particular, is often selected because of its highly developed pattern matching capabilities. In Perl, the patterns that are being searched for are generally referred to as regular expressions. A regular expression can simply be a word, a phrase or a string of characters. More complex regular expressions include metacharacters that provide certain rules for performing the match. The period (“.”), which is similar to a wildcard, is a common metacharacter. It matches exactly one character, regardless of what the character is. Another metacharacter is the plus sign (“+”) which indicates that the character immediately to its left may be repeated one or more times. If the data being searched conforms to the rules of a particular regular expression, then the regular expression is said to match that string. For example, the regular expression “gauss” would match data containing gauss, gaussian, degauss, etc.
Software modules and libraries can similarly be written to search for regular expressions beyond the five network/transport layer parameters described above. In particular, some enterprises may wish to identify network messages that are associated with applications that have not been assigned a well-known port number. Alternatively, an enterprise may be interested in identifying messages that are directed to a specific web page of a given web site. An enterprise may also wish to identify messages that are directed to or carry a particular uniform resource locator (URL). To identify such messages, an intermediate network device must examine more than just the five network/transport layer parameters described above. In this case, the actual data portions of the message(s) must be parsed for specific patterns, such as selected URLs.
Intrusion Detection
In addition, security is increasingly becoming a critical issue in enterprise and service-provider networks as usage of public networks, such as the Internet, increases, and new business applications, such as virtual private networks (VPNs), electronic commerce, and extranets, are deployed. Many organizations continue to rely on firewalls as their central gatekeepers to prevent unauthorized users from entering their networks. However, organizations are increasingly looking to additional security measures to counter risk and vulnerability that firewalls alone cannot address. Intrusion Detection Systems (IDSs) analyze data in real time to detect, log, and stop misuse or attacks as they occur.
Network-based IDSs analyze packet data streams within a network searching for unauthorized activity, such as attacks by hackers. In many cases, IDSs can respond to security breaches before systems are compromised. When unauthorized activity is detected, the IDS typically sends alarms to a management console with details of the activity and can often order other systems, such as routers, to cut off the unauthorized sessions.
Network-based IDSs are typically configured to monitor activity on a specific network segment. They are usually implemented on dedicated platforms having two primary components: a sensor, which passively analyzes network traffic, and a management system, which displays and/or transmits alarm information from the sensor. The sensors capture network traffic in the monitored segment and perform rules-based or expert system analysis of the traffic using configured parameters. For example, the sensors analyze packet headers to determine source and destination addresses and type of data being transmitted. The sensors may also analyze the packet payload to discover information in the data being transmitted. Once the sensor detects misuse, it can perform various security-related actions, such as log the event, send an alarm to the management console, reset the data connection, or instruct a router to shun (deny) any future traffic from that host or network.
As is the case with intermediate network devices, it is known to incorporate software modules or libraries for analyzing packets within IDS sensors. However, the evaluation of individual packets through software is an impractical solution for both intermediate network devices and IDS sensors which may both be required to analyze enormous volumes of traffic. Today's computer networks can generate hundreds if not thousands of diverse traffic flows at any given time. The use of advanced network equipment, such as fiber optic transmission links and high-speed transmission protocols, such as “Gigabit” Ethernet, further increase the speeds of these traffic flows. Furthermore, regardless of the processing power of the device's CPU (e.g., 16, 32 or even 64 bit), regular expression matching can typically only be performed one byte at a time, due to programming constraints.
Thus, the current software solutions for performing regular expression matching are becoming less efficient at performing their message processing tasks as transmission rates reach such high speeds. Accordingly, a need has arisen for a system that can perform regular expression matching at the high transmission speeds of current and future computer network equipment.