Data networks are fast becoming the backbone of all types of business. As such, data networks and the data that passes through them are one of the most important assets of any business. To help safeguard these assets and to properly process the data passing through them, both the incoming and the outgoing data must be scanned at ever increasing speeds to filter out unwanted content, flag important messages, and prevent unauthorized access. The scanning usually involves scanning the incoming data for patterns that may denote a virus, unwanted email, or, more importantly, a relevant message from a customer.
Conventional software scanners, unfortunately, are insufficient when it comes to scanning speed. Furthermore, they require a large expenditure in terms of both hardware and software.
Traditionally, it has been believed that a hardware solution is faster than a software solution. A desirable hardware solution would be a dedicated system that can be integrated into existing network components or, alternatively, can be built into newer models of network components. Even more ideally, such a solution would be implementable in silicon and would not require much area on a dedicated network component circuit board.
Regardless of whether such a solution is software or hardware based, one of the major pitfalls of scanning an incoming datastream is the “false positive” or a seemingly positive result that a pattern being scanned for is present in the data when, in fact, that pattern is not present. However, an even more dangerous pitfall is the “false negative”—a seemingly negative result for a pattern being scanned for when, in fact, that pattern is present. While the false positive merely sees target patterns where there are none, the false negative misses the target pattern when it is present.
Another major concern for scanning is the scan rate for negatives. This scan rate for negatives is the rate at which data can be scanned to determine whether a given data set has no chance of having a target pattern. A high scan rate means that data sets can be quickly removed from contention for the more resource consuming process of determining whether a full target pattern is present. Unfortunately, desirable high scan rates in the order of multiple gigabits of data per second are still beyond the practical limitations of software based scanning solutions.
To provide a workable scanning solution, the solution should be able to scan the data stream for multiple target patterns. Ideally, a single scan of a specified data set should be able to scan for such multiple target patterns. Performing multiple scans or passes of the data set would seriously degrade the performance of such a solution.
A further consideration to be taken into account is the size of the target patterns. If long patterns are not supported by a scanning solution, false positives are more likely to result because the patterns cannot be defined as completely.