Due to the increased popularity and use of the World Wide Web, web users and their computing systems have become more exposed to cyber attacks. Malicious Uniform Resource Locators (URLs) are widely used in the computing industry to perform cyber attacks on web users and their computing systems. Malicious URLs include phishing URLs, spamming URLs and malware URLs.
Phishing typically involves sending an email intended to deceive a recipient into clicking on a malicious URL that links to an illegitimate web page, instead of an authentic web page. Spamming may involve sending or providing users with unsolicited information via a malicious URL which has been configured to manipulate the relevance or prominence of resources indexed by a search engine. Malware typically involves using a malicious URL to secretly access and infect a computing system without the owner's informed consent or knowledge.
The detection of malicious URLs limits web-based attacks by preventing web users from visiting malicious URLs or warning web users prior to accessing content located at a malicious URL. Thus, malicious URL detection protects computing system hardware/software from computer viruses, prevents execution of malicious or unwanted software, and helps avoid accessing malicious URLs web users do not want to visit.
Conventional systems employ various sources (e.g., human feedback) to build a blacklist, which is a set list of known malicious URLs. Blacklisting identifies a malicious URL via matching a received URL with a URL on the blacklist. Although blacklisting is an effective means for identifying a known malicious URL, blacklisting cannot detect unknown malicious URLs. Therefore, it is easy for cyber attackers to evade conventional blacklisting systems by continuously modifying the manner in which malicious URLs are configured, thereby finding new approaches to attack web users, web browsers, search engines and the like.