Recognizing patterns within a set of data is important in many fields, including speech recognition, image processing, seismic data, etc. Some image processors collect image data and then pre-process the data to prepare it to be correlated to reference data. Other systems, like speech recognition, are real time where the input data is compared in real time to reference data to recognize patterns. Once the patterns are “recognized” or matched to a reference, the system may output the reference. For example, a speech recognition system may output equivalent text to the processed speech patterns. Other systems, like biological systems, may use similar techniques to determine sequences in molecular strings like DNA.
In some systems, there is a need to find patterns that are imbedded in a continuous data stream. In non-aligned data streams, there are some situations where patterns may be missed if only a single byte-by-byte comparison is implemented. The situation where patterns may be missed occurs when there is a repeated or nested repeating patterns in the input stream or the pattern to be detected. A reference pattern (RP) containing the sequence that is being searched for is loaded into storage where each element of the sequence has a unique address. An address register is loaded with the address of the first element of the RP that is to be compared with the first element of the input pattern (IP). This address register is called a “pointer.” In the general case, a pointer may be loaded with an address that may be either incremented (increased) or decremented (decreased). The value of the element pointed to by the pointer is retrieved and compared with input elements (IEs) that are clocked or loaded into a comparator.
In pattern recognition, it is often desired to compare elements of an IP to many RPs. For example, it may be desired to compare an IP resulting from scanning a finger print (typically 1 Kilobyte for certain combinations of features defined in fingerprint technology) to a library of RPs (all scan results on file). To do the job quickly, elements of each RP may be compared in parallel with elements in the IP. Each RP may have repeating substrings (short patterns) which are smaller patterns embedded within the RP. Since a library of RPs may be quite large, the processing required may be considerable. It would be desirable to have a way of reducing the amount of storage necessary to hold the RPs. If the amount of data used to represent the RPs could be reduced, it may also reduce the time necessary to load and unload the RPs. Parallel processing may also be used where each one of the RPs and the IP are loaded into separate processing units to determine matches.
Other pattern recognition processing in biological systems may require the comparison of an IP to a large number of stored RPs that have substrings that are repeated. Processing in small parallel processing units may be limited by the storage size required for the RPs. Portable, inexpensive processing systems for chemical analysis, biological analysis, etc., may also be limited by the amount of storage needed to quickly process large numbers of RPs.
Pattern detection or recognition is a bottleneck in many applications today and software solutions cannot achieve the necessary performance. It is desirable to have a hardware solution for matching patterns quickly that is expandable. It is also desirable to have a system that allows multiple modes of pattern matching. Some applications require an exact match of a pattern in an input data stream to a desired target pattern. In other cases, it is desirable to determine the longest match, the maximum number of characters matching, or a “fuzzy” match where various character inclusions or exclusions are needed.
Intrusion Detection Systems (IDSs) provide a means to detect patterns of bytes in packets that are certainly or probably associated with malicious activity. IDSs may operate on host-based systems or on network data flows, called network-based systems. In either case, an IDS looks for attacks (any malicious activity) originating from outside or inside the internal network and acts much like a burglar alarm. This is essentially a pattern recognition task where an IDS analyzes incoming data while attempting to detect known patterns (signatures) that indicate the presence of a known intruder.
Intrusion detection products are tools that assist in the protection of a network from intrusion by expanding the options available to manage the risk from threats and vulnerabilities. Intrusion detection capabilities may help a company secure its information. After an attack is detected by the system, the system may provide information about the attack. This information may be used to delete, log, or shun intruding packets. Support investigations then attempt to find out how the intruder breached the network security and then stop the breach method from being used by future intruders.
Malicious traffic is common in today's Internet. Current IDS performance may become a bottleneck at high bandwidth. Some attackers may launch a high-speed attack in order to overwhelm IDS and simultaneously a low-speed attack in hope that the low-speed attack will not be noticed. These attacks may be real-time, live traffic attacks. This means that computing networks must continually scan traffic to catch these malicious activities. Current IDS software throughput is inversely proportional to the network load, hence is more prone to attacks, and run at higher loads. IDS software enables either comprehensive or high speed detection, but not both.
There is, therefore, a need for a method and circuitry to form an IDS that is able to detect intrusions, identify a variety of attacks, and run at the real-time speed of the high performance network.