With the development of the Internet technology, there are more and more pornographic websites, fraud phishing websites and equestrian websites, which greatly threaten the security of information accessed by users on the Internet. Therefore, it needs to detect whether a malicious attribute exits in a webpage before page contents have been acquired and parsed in a browser, if the webpage has a malicious attribute, the user is prompted that the webpage current accessing is malicious webpage, so as to safeguard the Internet surfing of the user.
Traditional methods for detecting malicious attribute of a webpage includes signature-based detecting, behavior based detecting, sandbox filtering technique and honeypot. Among them, the principle of signature-based detecting likes this: signatures are stored in a signatures library by extracting samples of malicious codes and analysing the signatured instruction sequence of the samples; scanned documents are compared with the signatures library when detect a malicious attribute of a webpage; if there is a document fragment that match signatures, the detected webpage has a malicious attribute. The behavior based detecting distinguishes by detecting a behavior of a program, such as add a item to registry startup entries, modify content under system folder and call special or rarely-seen API functions in abnormal frequency, if such behaviors are detected, the detected webpage has a malicious attribute.
Because most of malicious codes in malicious webpages are written by JavaScript, these malicious JavaScript codes are obfuscated encrypted to evade detecting. In the face of obfuscated encrypted JavaScript odes, the most effective approach is sandbox filtering technique, i.e., parse and execute JavaScript codes in webpages in a virtual environment by built-in HTML and JavaScript parsing engine, and keep track of the behavior of JavaScript codes in parsing process, such as creating Active controls and concentrating large amount of alloc, the detected webpage has a malicious attribute.
Honeypot technique, including client honeypots, means that monitor abnormality or not by actively opening client software to access server, aiming at client software's possible security weakness, so as to further achieve the goal of research study and providing security. Client honeypots combine honeypot technique with spider technique predominantly for Web browser and E-mail client, and it can seek potential malicious software that executes by client software by spiders crawling network-based URL.
However, as the technique of malicious codes written by hackers improves, malicious codes are becoming more and more subtle, while traditional methods for monitoring a malicious attribute of a webpage are fairly simple that makes it hard to detect new malicious codes in the present in time, and the malicious webpage can cause certain harms before found features therein, thus traditional methods for monitoring a malicious attribute of a webpage have low accuracy.