1. Field of the Invention
The present invention relates to generating signature for network attack detection, and more particularly, to a method and apparatus for generating a network attack signature having a high reliability while minimizing a whitelist used to prevent false positive.
The present invention was supported by the IT R&D program of MIC/IITA. [2006-S-042-01, program title: Development of signature generation and management technology against Zero-day Attack]
2. Description of the Related Art
As a method used to protect a network or a computer, there is a method of filtering a packet having an attack signature by using pattern matching. In order to apply the method, a high-reliability attack signature has to be rapidly generated, and researches thereon have been carried out.
The attack signature automatic generation starts from an assumption that “a particular byte-sequence for attacking when a network attack occurs is included and the byte-sequence frequently occurs”. Examples of a conventional network-based signature generation technique based on the assumption includes as follows.
First, there is a method called Earlybird. In this method, a hash value is calculated by using a karp-rabin fingerprinting scheme, the calculated hash value is sampled (for example, sampled at a rate of 1/64), and a frequency of the corresponding hash value is recoded in a table. In addition, signatures that frequently occur in a network are selected from the hash values recorded n the table, and an address distribution of packets thereof is analyzed to generate a worm signature.
Second, there is a method called Autograph. In this method, a session which is suspected as an attack from among sessions connecting to a network, that is, traffic that cannot successfully set a session is stored, a content of a corresponding packet is reassembled, and the reassembled packet content is analyzed to generate a signature. Here, in order to separating a session suspected as the attack, an suspicious traffic detection technique such as port scan detection is mainly used, and a method of analyzing the assembled packet content is similar to that used in the aforementioned Earlybird.
The Autograph is different from the Earlybird in that the Autograph uses by combining the entire sessions but not each packet, and a content-based payload partitioning (COPP) scheme is used to extract a substring and a hash value thereof. Therefore, a payload that occurs in the Autograph has a variable size.
Last, there is an extended Polygraph method proposed to apply the aforementioned Autograph to a polymorphic worm. The Polygraph method shares a basic structure with the Autograph but is different from the aforementioned two methods in that several substrings instead of a single substring are combined to generate a signature. For this, the Polygraph method extracts a substring called a token, and the extracted substrings are used to generate a combination type signature without an order, a signature having an order, and a signature based on a statistical method, according to signature generation methods.
However, the network/computer protection method using the pattern matching that starts from the assumption that “a particular byte-sequence for attacking when a network attack occurs is included and the byte-sequence frequently occurs” has a problem of high false-positive rates. In order to solve the problem, the aforementioned three methods employ a whitelist.
The whitelist is a kind of database managed to avoid that a general byte-sequence that is not to be generated as an attack signature is repeatedly generated as the attack signature.
As a representative content included in the whitelist, there are application protocol headers. For example, since a web traffic is based on hypertext transfer protocol (HTTP), a frequency of methods such as get_message used in the HTTP is higher than that of other byte-sequences used for a payload. This example is applied to other applications such as peer-to-peer (P2P), file transfer protocol (FTP), simple mail transfer protocol (SMTP), and the like.
However, the number of application protocols used for the Internet is ten thousands or more, and the number of keywords (or methods) used therefor is very large. Therefore, when these are included in the whitelist, the whitelist becomes heavy, a time taken to retrieve the whitelist increases, and this causes a problem in that attack signature generation and real-time application are delayed.