The network enriches people's life, but more and more pornography, fraud, phishing web sites also emerge, and bring a serious threat to the security of the majority of Internet users when obtaining information on the Internet. Thus a detection engine for identifying malicious URLs is needed.
An existing URL cloud detection engine may effectively identify and prompt whether a URL accessed by a user has a malicious behavior. After the user inputs a URL to be accessed and before the browser displays corresponding page content, it is necessary for the URL cloud detection engine to obtain malicious attributes of the URL to be accessed by the user from a cloud detection center, identify whether the URL to be accessed by the user has a malicious behavior, and provide relevant prompts based on the identification result. Due to the malicious web sites are variant, the URL cloud detection engine must possess fast, efficient and accurate characteristics, so as to ensure that the malicious web sites may be timely and accurately found.
The identification for malicious attributes by existing URL cloud detection engine may be performed through text information of page DOM and BOM object, and using machine learning manner, such as Bayes classifier/keyword filtering and similarity matching. Although above technology may effectively identify text-based malicious fraud web site, the technology may not effectively identify non-text web content.
Moreover, the malicious pages in the prior art may evade the killing of detection engine through the following means.
(1) Text content is converted into an image. The contents of the whole page are made into an image, thus the killing is fought against through the manner that whole page is an image.
(2) Plaintext is encrypted and hidden. Since current detection engine mainly relies on the text information of the page, malicious webpage editors process the text information of a plaintext using encryption technology. When encountering an encrypted string without any semantics, an identification module of the detection engine cannot effectively identify the malicious webpage.
(3) Streaming media is used to fight against the detection engine. In order to prevent from being identified by current detection technology, in the existing malicious webpage, text information is hidden and displayed in a streaming media, such as a Flash. Thus the killing of existing detection technology may be evaded effectively.
(4) Normal text information is adopted to interfere with the killing of a detection engine. In order to evade the killing of existing detection technology, a large amount of normal text which is not displayed may be added to page contents to interfere with the identification module.
Therefore, how to efficiently and accurately detect the malicious URL has become a difficult problem and challenge for detection technology.