1. Field of the Invention
The present invention relates to a traffic analysis apparatus and a traffic analysis method for analyzing the characteristic of traffic on a network. The present invention relates in particular to a traffic analysis apparatus and a traffic method for efficiently detecting, in a large volume of traffic, that traffic which requires and employs extraordinarily broad bands, and for detecting and indicating the characteristic of that traffic.
2. Description of the Related Art
As the use of the Internet and LANs has grown, becoming ever more popular, the stable operation of these networks has likewise dramatically increased in importance. Thus, especially since a huge, though actually unspecified, number of users may, and do, download and employ a great variety of applications that are available on the Internet, and because, therefore, the probability is high either that the volume of regular traffic will increase and eventually exceed that which has been estimated, by Internet service providers, for example, or that there will be a drastic increase in malicious software traffic for the distribution of malware such as worms and viruses, how to detect and how to ascertain the characteristics of such varied traffic has become a problem for which a solution is urgently required.
As means for resolving this problem, a technique by which to specify, for subsequent characterization extraction, excessive and malign transmissions included in a large volume of traffic flowing via a large-scale network, such as the Internet backbone, is disclosed in JP-A-2005-285048. According to this technique, frequent traffic, i.e., traffic that probably is excessive or malign, is extracted from a large volume of traffic data using a basket analysis method, which facilitates the analyzation of a large amount of data and the extraction, from the data, of combinations of items for which the inclusion frequency is high. This technique also includes a feature that permits an analysis to be performed by referring only to the header data portions required for traffic data transmitted via a network.
Further, as a traffic analysis method, “number of varieties”, which, as applied, is the determination and use of the number of destination hosts employed by a specific host for communication, has drawn attention since the method can be employed to provide a parameter that is characteristic of a specific type of traffic. When cardinality is employed, an attack that is hard to identify when using only simple information, such as the quantity of communication data, or malign traffic, for which the purpose is network scanning, can be identified comparatively accurately. Cardinality information can also be obtained by referring only to the header information portion of traffic data that is required for transmission via a network. Generally, in order to obtain a count for cardinality, all values that appear (e.g., the addresses of opposite communication parties when for cardinality the number of such parties are to be counted) must be stored, and for this, a large memory capacity is required. As one method for providing a solution to this problem, a technique is disclosed in NetHost: Aggregation of Traffic Summary Per-Host, 2006 IEICE General Conference, BS-5-2. According to this technique, instead of directly storing a target value, a hash value is calculated and a data entry is recorded, indicating that the target value appeared in a bit on a bitmap that corresponds to the hash value. In this manner, the required memory size can be reduced, and the hash value can be used for the cardinality count.
According to the conventional art in the JP-A-2005-285048, since a data mining technique is employed for the extraction of excessive or malign traffic, the rapid processing of a large amount of traffic is enabled, without imposing any limitations on a target being monitored and by employing only the header information for packets. However, since information that is useful for cardinality calculations, for identifying traffic characteristics, is not collected, it is not possible to determine the source applications for the frequent traffic data that were extracted, nor is it possible to determine what types of malign traffic were intercepted.
Further, for the technique described in the JP-A-2005-285048, the technique described in NetHost: Aggregation of Traffic Summary Per-Host, for example, may also be employed as means for collecting additional analysis information. However, the technique described in the JP-A-2005-285048 is a method whereby, without physically limiting monitoring target traffic, data mining is performed, while information related to multiple traffic types is stored at the same time. Thus, when this technique and the one in NetHost: Aggregation of Traffic Summary Per-Host are employed together, a cardinality counting memory must be prepared for each of multiple traffic types that are currently being analyzed. As a result, in total, a very large memory capacity is required.