With the widespread use of the Internet, cyber attacks including Distributed Denial of Service (DDoS) attacks and the transmission of spam emails rapidly increase. Most of the attacks are caused by malicious software called malware. Attackers illegally control ordinary users' terminals or servers by making malware infect the terminals or servers and controlling the malware so as to collect information or perform a new attack. These attacks have become a social problem. Thus, the need to take measures against cyber attacks centered on malware infection has become urgent.
As the measures against cyber attacks, a measure on a terminal and a measure on a network have been discussed. As the measure on a terminal, a technique using anti-virus software, and a technique using a host-based Intrusion Detection System (IDS) or a host-based Intrusion Prevention System (IPS) have been discussed. In these techniques, the measures are taken by software installation on a terminal. On the other hand, as the measure on a network, a technique using a network-based IDS or IPS, and a technique using a Firewall or a Web Application Firewall (WAF) have been discussed. In these techniques, inspection devices are arranged on the connecting parts of the network. Furthermore, for example, Security Information and Event Management (SIEM) service that detects the trace of an attack by analyzing the log of the terminal or device has been provided. All the methods take measures based on the prepared information about the characteristics of known attacks.
To collect the information about the communication of such an attack in the techniques for the measures, a decoy system called honeypot is used to collect the other end and contents of the communication of a malware infection attack or another cyber attack. Alternatively, a malware analysis system called sandbox is used to make malware actually operate in order to collect the other end and contents of the communication of the malware. Alternatively, an anti-spam-email system or an anti-DDoS system is used to collect the other end and contents of the communication determined to be an attack. Furthermore, the characteristic information including the Uniform Resource Locator (URL) of the destination or the Internet Protocol (IP) address of the destination is extracted from the information about the communications associated with the attack. In such extraction, an existing technology such as machine learning is often used to automatically extract the characteristic information from the information about the communications associated with the attack. In such a technology, the information about the communications associated with the attack is classified into predetermined items including the date and time, the IP address of the other end of the communication, the port number used for the communication, the number of communications in a given period of time, and the amount of traffic of communications. Each of the items is aggregated. In the aggregate, observed values are often input as the date and time or the port number. On the other hand, statistics including the average value, the standard deviation, and the variance are sometimes input as the number of communications or the amount of traffic of communications. After the calculation of the aggregates, for example, a search for a statistical outlier is conducted. When an outlier is found in an item, the communication associated with the outlier is determined to be an attack. Meanwhile, the outlier of the item is determined to be the rule for searching for an attack. The outlier of the item is also specified as the characteristic information about an attack. Furthermore, for example, the IP addresses associated with found attacks are blacklisted. The blacklist may be used as the characteristic information used to determine the communication with the IP address as an attack. Note that the URLs of the other ends may also be blacklisted. In such as case, the URL may be blacklisted with regular expression. Note that, when the traffic logs or alerts are collected from different devices or different types of software in order for the extraction of the information about the other ends or contents of the communications from the traffic logs or alerts, the notations to represent the items vary depending on the devices or the types of software. However, technologies to convert the log information entries represented in different notations into the log information entries represented in a unified notation in order for the aggregate of the information items have been spread as SIEM products.