Advertising on the Internet is one of the cheapest forms of advertising. Spam is a mass mailing of advertising or other form of information to people who have not expressed a desire to receive it. Spam includes messages sent by electronic mail, instant messaging protocols, in social networks, blogs, dating sites, forums, and also SMA and MMS messages. Spam messages have become the main and most large-scale form of advertising in the modern world, which take up around 70-90% of the total volume of global mail traffic.
Given the continual growth in volumes of spam mailing, problems of a technical, economic and criminal nature arise. The excess network traffic caused by spam message may overload data transmission channels and network equipment, review and handling of spam messages waists users time, the use of spam messages to perpetrate fraud and thievery—these and other aspects show the acute need for a combat spam.
Many methods exist for counteracting spam mailings. One of the most effective is the use of anti-spam applications, which detect, filter and remove unwanted spam messages. One of the key conditions for spam filtering is to avoid false spam detections, which may involve the blocking of legitimate messages. For example, the method using black lists, which essentially involves removing messages arriving from addresses contained in a black list, provides 100% filtering of messages from blacklisted addresses. However, when addresses of ordinary users mistakenly end up on the black list, a false spam detection may occur and legitimate messages may be filtered out and not delivered to their destination.
Another method of counteracting spam is using content filtering, which involves the use of special spam filters that analyze the constituent parts of messages, including graphics. From the results of the analysis, a lexical vector or a spam weight of the message may be calculated, which can be used to determine whether the message is spam or not.
Another method spam detection technique is message clustering, which allows the detection in the mail flow of mass messages that are absolutely identical or that differ slightly. The drawback of this method is that the majority of legitimate services, such as news subscription or update services for example, also employ mass mailing and, consequently, can be incorrectly recognized as sources of a spam mailing when this method is used.
Anti-spam laboratories are engaged in creating and improving the filtering rules used by spam filters. At the same time, the people engaged in spam mailing are constantly making attempts to bypass the protection of spam filters. The existing methods of counteracting spam have a number of shortcomings and cannot fully resolve the problem.
Therefore, there is a need to improve spam detection techniques.