1. Field of the Invention
The present invention relates to the field of information processing, and more particularly, to a method and apparatus for spam message detection.
2. Description of the Related Art
Spam messages and spam mail affect user experience and system performance. There exist a variety of approaches for detecting spam messages. One such approach is a user feedback based approach, where a user identifies and reports a spammer. According to another approach, such as a social network based approach, a social network archive is established for each user and a message transmitted by the user to other users outside of the social network is determined to be a spam message. A relatively large data record system is required to store the reported spammer or the social network archive, and such a data record system needs to be shared among various service operators, which complicates the feasibility of using these systems across various service operators.
According to a message content based approach, a message will be determined to be a spam message if it contains a preset keyword. In this approach, an excessively small set of keywords will cause a high false negative rate, while an excessively large set of keywords will affect a detection speed. This may lead to privacy concerns since the approach checks message content. In addition, the spammer can escape detection in a simple, flexible manner such as inserting a space within a keyword.
In another approach based on message transmission speed, a message source can be determined to be a spammer if it transmits bulk messages or repeated messages in a short span of time. The spammer can reduce the number of messages transmitted by each message source within the short span of time by making multiple message sources transmit messages in turns, while a normal user may transmit bulk messages in a short span of time under some circumstances.