1. Field of the Invention
The present invention relates to the field of information processing, more particularly, to a method and apparatus for spam short message detection.
2. Description of Related Art
Spam short message has seriously influenced user experience and system performance. There are already a variety of approaches for detecting spam short message. According to user feedback based approach, a user will identify and report a spammer. According a to social network based approach, a social network archive is established for each user and a short message sent by the user to other users outside of the social network is determined as a spam short message. The problem is, a large data record system is required to store the reported spammer or the social network archive, and that data record system needs to be shared among various service operators, which is infeasible for the operators.
According to short message content-based approach, a short message will be determined as a spam short message if it contains a preset keyword. The problem is, if the set of the keywords is too small, it will cause high false negative rate, and if it is too large, it will affect detection speed; and checking short message content may lead to privacy concern; a spammer can escape from detection with simple flexible manners such as inserting a space within a keyword.
According to an approach based on short message sending speed, a short message source will be determined as a spammer if it sends a bulk of short messages in a short time. However, this has a drawback. The spammer can reduce number of short messages sent by each short message source within a short time by making multiple short message sources send short messages alternately, while a normal user may send a bulk of short messages in a short time under some circumstance.
Thus, there are drawbacks of poor availability or easy avoidance in existing spam short message detection methods that are based on a single short message source characteristic.