Internet advertising is one of the cheapest types of advertising. Internet-based spam messaging is the most prevalent type of Internet advertising, accounting for 70% to 90% of the total volume of email traffic. Spam is mass sending of advertising or other kinds of information to persons who did not solicit them. Spam includes messages sent by email, by instant messaging protocols, on social networks, blogs, dating websites, forums, as well as SMS and MMS messages.
The increased volume of spam raises a number of technical, economic, and criminal issues. Such issues include heavy loads on data transfer equipment and other resources, user time for message processing, and message content shifting towards fraud and theft. Plainly, there is an urgent need to detect and control spam.
There are many ways to counter spam. One of the most effective ways is to use anti-spam software, such as anti-span applications for identifying and removing undesirable spam messages. Anti-spam applications can use methods that help to filter and remove spam. Such methods are often based on analysis of word combinations and check sums from word combinations of the message text.
For example, U.S. Pat. No. 7,555,523 describes a system where letter sequences of a message body are analyzed using n-grams of various lengths. The conclusion as to whether a message contains spam is made based on a search for similar sequences from a database of spam-containing sequences.