A drive-by download attack transfers a client that has accessed a website as a starting point of the attack (hereinafter, referred to as a landing URL) to a plurality of websites (hereinafter, referred to as redirect URLs) mainly using a hypertext markup language (HTML) tag and a code of JavaScript (registered trademark) or the like, and then, transfers the client to a malicious website (hereinafter, referred to as an exploit URL) executing an attack code. When the client accesses the exploit URL, the attack code abusing vulnerability of a browser or a browser plug-in (hereinafter, referred to as a plug-in) is executed and the client is forced to download and install a malicious program (malware) such as computer viruses from a specific website (hereinafter, referred to as a malware distribution URL).
There are various methods for transferring the client to the specific URL, such as a method in which the client is transferred to a URL designated with an HTML tag, a method in which the client is transferred to the URL designated using the code of JavaScript or the like, and a method in which the client is transferred using the 300's status code of hypertext transfer protocol (HTTP). The client can also be transferred to a URL designated with an inserted HTML tag by dynamically generating the HTML tag with the code of JavaScript or the like and inserting the HTML tag into HTML loaded onto a browser. The drive-by download attack transfers the client that has accessed the landing URL to the malware distribution URL by combining various transfer codes.
In many cases, the website that is used for the landing URL is a website of a URL contained in a spam mail or messaging service of social network service (SNS) or a general website illegally compromised by an attacker. In particular, the case in which a general website becomes the landing URL by compromise of the website and is involved in the drive-by download attack has a large influence and causes numerous malware infection damages. The website compromise never ceases and it is necessary to detect the compromise quickly, specify and modify contents (for example, a transfer code inserted by the compromise) of the compromised website, and thereby prevent spread of infection by the drive-by download attack.
As a method for detecting the drive-by download attack, a method in which change in a file system due to download of malware from a malware distribution URL is detected (see Non-Patent Document 1), a method in which malicious JavaScript is detected by executing JavaScript with an emulator of a browser (hereinafter, referred to as a browser emulator) and analyzing an execution result (see Non-Patent Document 2), and the like have been known.
In addition, a method in which a link structure from a landing URL to a malware distribution URL is specified and the link structure is traced in the reverse order from the malware distribution URL to efficiently search for malicious websites present in the vicinity of the malicious website (see Non-patent Document 3), a URL signature generation method in which URLs common to respective link structures collected by patrolling a plurality of websites are specified using the link structures to efficiently detect and interrupt access to malicious URLs such as an exploit URL conducting a drive-by download attack and a malware distribution URL (see Non-Patent Document 4), and the like have been known.
All of the above-mentioned methods are however methods for detecting the malicious URL and cannot specify the content and the script involved in the attack in the website of the detected malicious URL. That is to say, when the landing URL is the compromised website, a compromised place of the content in the website cannot be specified.
As a method for detecting website compromise, a method in which content (original content) before compromise and content after compromise are compared has been known. For example, a method in which comparison and detection are performed using HTML as the original content (see Non-Patent Document 5), a method in which comparison and detection are performed using a notable library or framework of JavaScript as the original content (see Non-Patent Document 6), and the like have been known.
In addition, there is a tool called TripWire (see Non-Patent Document 7) monitoring files that are previously stored on a web server and notifying a web server manager of detection of an operation such as change or deletion of contents of any of the files by transmitting a mail thereto when such operation is detected.