One major problem facing modern computing systems and communications systems is the prevalence of spam messages.
One common form of spam is a message that includes a URL that, when activated, links to one or more websites that include unsolicited, malicious, unwanted, offensive, or nuisance content, such as, but is not limited to: any content that promotes and/or is associated with fraud; any content that includes “work from home” or “be our representative” offers/scams; any content that includes money laundering or so-called “mule spam”; any content that promotes and/or is associated with various financial scams; any content that promotes and/or is associated with any other criminal activity; and/or any content that is unsolicited and/or undesirable, whether illegal in a given jurisdiction or not.
Traditional anti-spam techniques are largely heuristic and involve identifying spam based on analyzing known spam and then creating spam signature files/parameters. Using these traditional anti-spam techniques, a percentage of new spam messages does pass through to “sacrificial” end user machines before the spam is identified as spam and spam signature files/parameters can be created, but, traditionally, once the spam signature files/parameters are identified, future instances of the spam are blocked.
The traditional anti-spam techniques that use spam signature files/parameters are effective so long as the spam signature remains relatively constant for a significant number of messages, i.e., the traditional spam signature files/parameters only work as long as the spam message parameters continue to match the spam signature files/parameters. However, traditional anti-spam heuristic technology is reaching maturity, and spammers now have more knowledge, and skill, than ever before regarding how to avoid detection by traditional anti-spam techniques.
In particular, spammers have recently begun sending very short spam messages, often from semi-legitimate sources, such as webmail providers. As a result, the headers of these spam messages are often legitimate, or semi-legitimate, and the source of the message cannot be readily used as a spam signature parameter without the risk of numerous false positive, i.e., false spam, verdicts.
In addition, many of these very short spam messages contain a URL link and little, or nothing, more. Consequently, these spam messages often include little, or no, spam-related content in the body of the message that could be used to create spam signature files or parameters.
As a result of the situation described above, in more and more cases, the only spam signature parameter available is the URL link included in the message itself. However, the prevalence of URL shortening services has significantly complicated traditional URL analysis. URL shortening services typically provide users, including spammers, the ability to shorten the size, or number of characters, associated with a given URL by providing shortened URLs that map to the longer actual URL. URL shortening services are legitimately used to allow the URL to be included in text size limited communications, such as Twitter™. On the other hand, spammers can use URL shortening services to mask an actual spam URL by having multiple shortened URLs created that map to the actual URL.
The prevalence of URL shortening services, and the fact that spammers are tending towards using a given URL in only a very small number of messages, in some cases in only a single message, traditional URL-based spam signatures, and associated URL block lists, are ineffective because each URL-based spam signature will only block a very small number of messages, if any, before it is irrelevant. Consequently, reactively blocking a URL after the message containing it has been sent is often entirely ineffective for this type of spam.
Creating very short spam messages, sending them from semi-legitimate sources, and with shortened URLs that are included in/with very few spam messages, is relatively simple and often cost free to the spammer. Consequently, from a spammer's perspective, these methods of distributing spam, and defeating spam protection systems, are extremely effective. As a result, using currently available spam detection systems and methods, many types of spam, and particularly newer forms of spam, are extremely difficult to identify and isolate and, therefore, many of these nuisance, and at times harmful, messages still find their way to thousands of victims each year. Clearly, this is a far from ideal situation for the victims, but it is also a problem for all users of e-mail, and other message systems, who must suffer with the delays of false positives and/or must be wary of all messages, even those of seemingly legitimate origin and intent.