The invention relates to methods and systems for classifying electronic communications, and in particular to systems and methods for filtering unsolicited commercial electronic communications (spam).
Unsolicited commercial electronic communications, also known as spam, form a significant portion of all communication traffic worldwide, affecting both computer and telephone messaging services. Spam may take many forms, from unsolicited email communications, to spam messages masquerading as user comments on various Internet sites such as blogs and social network sites. Spam takes up valuable hardware resources, affects productivity, and is considered annoying and intrusive by many users of communication services and/or the Internet.
In the case of email spam, software running on a user's or email service provider's computer system may be used to classify email messages as spam or non-spam, and even to discriminate between various kinds of spam messages (e.g., product offers, adult content, email scams). Spam messages can then be directed to special folders or deleted.
Similarly, software running on a content provider's computer systems may be used to intercept fraudulent messages posted to a website and prevent the respective messages from being displayed, or to display a warning to the users of the website that the respective messages may be spam.
Several approaches have been proposed for identifying spam messages, including matching the message's originating address to lists of known offending or trusted addresses (techniques termed black- and white-listing, respectively), searching for certain words or word patterns (e.g. refinancing, Viagra®, stock), and analyzing message headers. Feature extraction/matching methods are often used in conjunction with automated data classification methods (e.g., Bayesian filtering, neural networks).
Spam often arrives in a rapid succession of groups of similar messages also known as spam waves. The form and content of spam may change substantially from one spam wave to another, therefore successful detection may benefit from methods and systems capable of quickly recognizing and reacting to new spam waves.