Due to its prevalence in today's society, the internet and other types of networks have become a hub for criminal activity. Criminal enterprises and/or actors often attempt to install malware or other harmful software on systems used by unsuspecting users by directing those users to malicious resources identified by uniform resource identifiers (URIs) (e.g., malicious web addresses). Therefore, it would be helpful to users to know which URIs are malicious and which are safe (non-malicious) before accessing those resources.
Current technology primarily uses rule-based filtering to identify malicious URIs. For example, Proofpoint URL Defense checks a URI against a set of blacklists and internet protocol (IP) reputation scores. If a URI is on a blacklist or associated with a questionable domain, it is classified as malicious. However, this technique is static and does not provide flexibility in adapting to threats. Other techniques have been proposed to create predictive models to classify malicious web site URIs based on certain features that are indicative of suspicious URIs.
A need exists, therefore, for methods and systems that classify URIs based on a plurality of diverse features that can be used to classify new URIs.