1. Field of the Invention
The present invention relates to a method and apparatus for storing pattern matching data and a pattern matching method using the method and apparatus, and more particularly, to a pattern matching data storing method and apparatus capable of quickly storing and retrieving a predetermined pattern in a variety of fields, such as an intrusion detection system, a spam mail checking system, and a high capacity database, and a pattern matching method using the method and apparatus.
2. Description of the Related Art
As high speed networks have become commonplace, a variety of services using the networks have been introduced. With the introduction of these services, it has become possible to transmit a variety of information using the networks, and based on this environment, diverse business models have come to be generated.
However, since a lot of illegal copying and hacking activities are performed on the networks, those who want to run commercial businesses need a network infringement countermeasure apparatus and other related apparatuses in order to protect information generated for profits or to be sold for profits. Here, the most basic technology for these apparatuses is a high speed pattern matching technology.
Pattern matching technology is a technology encompassing a wide range of applications. It can be used in a search engine of an intrusion detection system to detect viruses or hacking activities, or it can be used to find out spam mail or a predetermined pattern on a network. Thus, its applications are very diversified. More specifically, for example, its applications include finding a desired letter in a long text file, security systems, such as a network intrusion detection system, a spam mail removal system, and a database system. Accordingly, there is great demand for pattern matching technology.
At present, the pattern matching technologies applied to these applications are mostly implemented by software. This is because most pattern matching technologies were developed a long time ago, and previously, most applications had to find a simple pattern in a small number of data sets rather than comparing a pattern with a large number of patterns.
However, these days where high performance networks have been developed and a huge amount of data is frequently transmitted and managed, it is impossible to find a pattern in real time using these software-based methods. Though these software methods may be used in non-time-dependent cases, these methods requiring a long time to find a pattern cannot be applied to systems that operate in real time, such as an intrusion detection system, a spam mail removal system, and a network analysis system,
Accordingly, in order to improve the performance of the pattern matching technology, a technology comparing patterns by implementing an algorithm with hardware has been introduced. However, when a pattern matching engine is implemented by hardware, the following problems occur.
First, since a pattern is modified whenever a user specifies a pattern, it is difficult to implement it with hardware. Also, generally the pattern matching technology should retrieve a data character string as well as the header of a network packet, and the length of a data character string desired to be found varies greatly from a simple one-byte rule to an over 100-byte data character string that requires a high accuracy. Also, since the number of patterns that are desired by a user varies from hundreds to millions, it is very difficult to implement this with hardware.
In a hash method that is one of the described above, due to the phenomenon in which a data retrieval time increases because of hash key collision and tree type data connection, the amount of data that should be processed increases and it is difficult to implement a desired performance. That is, when the number of patterns (or rules) that the user wants to find out through comparison, a problem, such as a shortage of resources in hardware, occurs (the increase of resources causes the cost to increase) and its implementation is difficult. A comparison with a huge number of patterns by investing all available resources may be performed but in that case the cost is too high. Accordingly, it can be regarded as an appropriate solution.
Examples of an actual implementation of a hardware-based pattern matching apparatus are as the following. FIG. 1 illustrates an example of a conventional pattern matching apparatus, and FIG. 2 illustrates another example of a conventional pattern matching apparatus.
However, in the hardware-based pattern matching apparatuses illustrated in FIGS. 1 and 2, if the number of patterns desired to be compared increases, a field programmable gate array (FPGA) has to be newly programmed. Also, when many rules are mounted, the complexity of the circuit and the amount of memory use increases such that it becomes difficult to implement a desired speed and the cost also increases.
Also, by adding an additional external memory, the amount of patterns that can be compared can be increased. However, since the speed of an external memory is relatively slower than that of an internal memory, the method of frequently accessing the external memory and reading and/or writing data is not a satisfactory solution. Because of these problems, the hardware-based pattern matching apparatuses can compare only a small number of patterns at a high speed. Accordingly, for products handling a large amount of data at high speeds, a new technology is urgently needed.