1. Field of the Invention
The present invention relates to an apparatus and method used for generating a signature of a network attacking packet and, more particularly, to an apparatus and method for extracting signature candidates and optimizing a corresponding signature for protecting a target network from a malicious program or use.
This work Was supported by the IT R&D program of MIC/IITA[2006-S-042-01, Development of Signature Generation and Management Technology against Zero-day Attack]
2. Description of the Related Art
In general, a technology for detecting an attacking packet is divided into an anomaly detection scheme and a signature detecting scheme. Although the anomaly detection scheme can detect unknown attacks, the anomaly detection scheme has a disadvantage of having a high false positive that is a rate of data falsely determined as attack by a system in entire normal data. On the contrary, the signature detection scheme has a high accuracy but cannot detect unknown attacks. When a new attack is generated, the signature detection scheme takes a long time to generate a signature. In order to overcome such shortcomings of the above described two schemes, another conventional technology was introduced, which automatically generates a signature for a corresponding attack by analyzing network packets when an attack is generated on a related network.
In order to generate a signature, it needs to analyze the payloads of packets. In case of early stage worms, attacking packets have identical payloads or having a predetermined identical part in the payloads. As the attack has become intelligent like as a polymorphic worm, the attacking packets having the same payloads become significantly reduced, and the location of the identical part of the payload also changes.
Representative technologies for detecting attacking packets and generating signatures therefore can be summarized into following three technologies.
As the first conventional technology, an early bird was introduced. The early bird extracts signature candidates from entire network traffics. While extracting the candidates, each network packet is divided into substrings with a predetermined constant length (k-byte). Herein, each substring is separated using a moving window scheme. That is, the first substring is formed of bytes from the first byte to the kth byte in a payload of a corresponding packet. The second substring is formed of bytes from the second byte to the (k+1)th byte. If the payload size of a corresponding packet is x, total x−k+1 strings are generated. The results of hashing combinations of the information in the separated substrings and the header field of a corresponding packet are used as an analysis unit. These values are sampled at, for example, 1/64, and the frequency of a corresponding hashing value is recorded in an additional table. Then, a value frequently appeared at a network among the hash values in the table is extracted as a signature candidate. Based on the extracted signature candidates, a final attacking signature is generated through analyzing the address dispersion of packets and correlation thereof.
As the second conventional technology, an autograph was introduced. The autograph extracts signature candidates only from traffics of sessions that are doubtful as attack, for example, from accesses that unsuccessfully establish a related session, among sessions accessing a network. In order to discriminate the attack-doubtful session, abnormal detection schemes such as a portscan scheme are used. In the portscan scheme, payloads of corresponding sessions are sequentially combined according to the attack-doubtful session, the combined one long string is separated using a content-based payload partitioning (COPP) scheme, and signature candidates are extracted from substrings as an analysis unit. Then, a final attacking signature is generated based on the number of the attack-doubtful sessions. The COPP scheme defines an anchor value with a predetermined value, and separates substrings from a location of a packet where the defined anchor is shown. For example, a string from the kth anchor to the (k+1)th anchor is separated as a substring. Therefore, the lengths of substrings are different in case of the COPP scheme.
As the third conventional technology, a polygraph was introduced. The polygraph is an extended autograph technology for applying the autograph into a polymorphic worm. The polygraph extracts signature candidates from traffics doubtful as attack like as the autograph. The signature candidates are extracted by applying a longest substring algorithm is applied to the same type of attack flows. Herein, the extracted signature candidates are longest substrings belonging to more than k flows among total n flows. The polygraph introduces a method of combining the extracted signature candidates as a method of optimizing a signature. The introduced combining method includes a method of generating a combining type signature without a predetermined order, a method of generating a combining type signature with a predetermined order, and a method of statistically generating a combining type signature. The autograph and polygraph can advantageously detect an attacking signature appeared through more than two consecutive packets by reassembling packets in one session and using the reassembled packets. On the contrary, it is difficult to embody the autograph and the polygraph for a high speed network due to a processing power for reassembling a session and a memory access delay.
Hereinafter, the problems of conventional technologies will be described.
The autograph and polygraph extract signatures only from flows detected by the anomaly detection. Therefore, the autograph and polygraph have the false negative of the abnormally detection. Herein, the false negative is a rate of data falsely decided as normal data among entire attacking data. Theses methods use a method of reassembling founded traffic flows and finding commonly used strings on the reassembled traffic flow. Therefore, the complexities of two methods are closely related to the number of the founded traffic flows. That is, if the false negative is lowered, the number of flows becomes increased, thereby taking a longer time to process. On the contrary, if the false negative is raised, the processing time becomes shortened but the false negative for the extracted signature candidates increases. Currently, these methods use an abnormally detection scheme for detecting a scanning worm. As described above, these two methods perform analysis after reassembling flows. Therefore, the flow reassembly must be embodied as hardware in order to embody these two methods as hardware.
Furthermore, the all packet contents of each flow must be stored. Therefore, a large quantity of memory is required to store the contents thereof in case of greatly generating abnormal flows. If an allowable resource quantity is exceeded as the number of flows increases, the false negative of extracting results may increase. In case of the autograph, a COPP method is used to discriminate the flows in an analysis unit. The COPP method is weak against to a polymorphic attack. In case of the polygraph, a suffix tree is used to classify flows in an analysis unit. The autograph method has a computation complexity in proportional to the sum of lengths of flows having the same abnormal feature to build the suffix tree. In order to disassemble a suffix tree after building the suffix tree, the suffix tree is required to be traversed. Such a traversing process requires many memory accesses, and it is difficult to embody the autograph in on-line based hardware.
The early bird checks all traffics, and extracts frequently shown strings as signature candidates. Therefore, the early bird has a better false negative than the autograph and polygraph. Also, the early bird can be embodied in hardware, basically. However, this system is weak against the polymorphic worm. That is, in order to deal with a polymorphic attack, the payload of a network packet is divided into short units, and the divided short units are analyzed. In this method, an analysis process is performed on entire packets or a 40 byte-long string, or a sampling process is performed. If the analysis object is long, they are seldom shown on the network traffic except an application program header.
Therefore, if a predetermined analysis object is frequently shown, it can be identified as a case of really appearing the analysis object frequently or a case of frequently generating hash collision. Herein, the number of extracted signature candidates can be significantly reduced by removing the case of frequently generating hash collisions. However, if a predetermined analysis object is frequently shown when the length of analysis object is shortened, most of cases are that the predetermined analysis object is really frequently shown on a network. That is, the early bird is not an effective method to remove the hash collision. Also, the number of outputs in this step is remained as a large number. It increases the number of entries to be analyzed in later and causes a problem in a hardware system operated with limited resources.