A Uniform Resource Locator (URL), often referred to as a “web address,” identifies a resource accessible via the World Wide Web. Some URLs, however, may direct users to malicious content. URL analysis systems utilize various security techniques to identify malicious URLs (i.e., URLs identifying web resources including malicious content). One goal of malicious URL analysis is to identify the malicious URLs prior to a user visiting the web resource identified by the malicious URL or otherwise downloading content from such web resource. However, identifying malicious URLs can be difficult due to the dynamic content of many web resources and the often short lifespan of a malicious URL. As such, typical URL analysis systems often struggle to keep up with the growing number of URLs requiring analysis.
Typical URL analysis systems often use various analysis techniques including, for example, maintaining “blacklists” that identify known malicious URLs. However, maintaining “blacklists” and corresponding “whitelists” is time intensive and fails to include all known malicious URLs at any given point in time due to the transient nature of malicious URLs. In other URL analysis systems, the content located at the web resource identified by the URL may be analyzed. Again, such analysis, especially when done in real-time, is time and resource intensive and can expose a user to potential threats. Additionally, some analysis systems identify malicious URLs based on non-content analysis using human-crafted rules. Such analysis systems, however, are difficult to scale and less agile to new or emerging threats.