Due to the increased popularity and use of the World Wide Web, web users and their computing systems have become more exposed to cyber attacks and security threats. Malicious Uniform Resource Locators (URLs) are widely used in the computing industry to perform cyber attacks on web users and their computing systems. For example, malicious URLs may be accessed by users that are subjected to phishing attacks, spamming attacks, and malware attacks.
Phishing is a cyber attack, and therefore, a security threat that attempts to acquire sensitive or private information from unsuspecting victims (e.g., user names, user passwords, social security numbers, birthdates, credit card numbers, etc.). For example, phishing may involve sending an email intended to deceive a recipient into clicking on a malicious URL that locates, or points to, an illegitimate or counterfeit resource (e.g., a Web site or Web page). The illegitimate or counterfeit resource may be visually similar to an authentic resource. The recipient may then unknowingly provide the sensitive and private information to the illegitimate or counterfeit resource because the recipient incorrectly believes that the illegitimate or counterfeit resource is the authentic resource.
Spamming may involve sending or providing users with unsolicited information via a malicious URL which has been configured to manipulate the relevance or prominence of resources indexed by a search engine. Malware typically involves using a malicious URL to secretly access and infect a computing system without the owner's informed consent or knowledge.
Conventional systems for detecting malicious URLs and limiting cyber attacks and security threats employ various sources to build a blacklist (e.g., human feedback or classification). A blacklist is a list of known malicious URLs. Blacklisting identifies a malicious URL via matching a received URL with a URL on the blacklist, and then blocks the malicious URL when a match occurs. Although blacklisting is an effective means for identifying a known malicious URL, blacklisting cannot detect unknown malicious URLs that are not on the list. Therefore, it is easy for cyber attacks to evade conventional blacklisting systems by continuously modifying and altering the manner in which malicious URLs are configured so they do not result in a blacklist match.
In contrast to blacklisting, some conventional systems use whitelisting to identify known benign web sites by maintaining a list of URLs and/or domains that are known to be threat free. However, whitelisting is not a desirable counter measure to malicious URLs because whitelisting unavoidably blocks benign URLs and/or domains that are not included in the whitelist.