1. Field of the Invention
The present invention relates generally to electronic mail (“e-mail”), and more particularly but not exclusively to identification of spam e-mails.
2. Description of the Background Art
E-mail provides a convenient, fast, and relatively cost-effective way of sending messages to a large number of recipients. It is thus no wonder that solicitors, such as advertisers, use e-mail to indiscriminately send messages to e-mail accounts accessible over the Internet. These unsolicited e-mails, also referred to as “junk mail” or “spam”, are not only a nuisance, but also translate to lost time and money as employees or home users are forced to segregate them from legitimate e-mails.
Techniques for combating spam have been developed and made available to the general public. Most of these anti-spam techniques involve detection of texts typically employed by spammers. Because some texts are more indicative of spam than others, each text is assigned a weight. Some anti-spam engines assign weights using a so-called “genetic algorithm.” In operation, an incoming e-mail is checked for the presence of these texts. The weights of found texts are then added up to generate a spam score. If the spam score is higher than a certain spam threshold, the e-mail is deemed a spam. Otherwise, the e-mail is deemed legitimate.
One problem with existing anti-spam techniques is that the use of genetic algorithm to assign weights to texts involves relatively long computational time and may not provide good results against unsolicited e-mails that vary considerably from those used to compute the weights. Another problem with existing anti-spam techniques is that use of text matching alone to identify spam may be insufficient to detect whether an e-mail is spam or legitimate.