The invention relates to systems and methods for classifying electronic communications, and in particular to systems and methods for filtering unsolicited commercial electronic mail (spam).
Unsolicited electronic communications, also known as spam or junk mail, form a significant portion of all communication traffic worldwide, affecting both computer and telephone messaging services. Spam takes up valuable hardware resources, affects office productivity, and is considered annoying and intrusive by many recipients of such messages.
Software running on an email user's or email service provider's system may be used to classify email messages as spam or non-spam, and even to discriminate between various kinds of spam messages (e.g., product offers, adult content, email scams). Spam messages can then be directed to special folders or deleted. Several approaches have been proposed for identifying spam messages, including matching the message's originating address to lists of known offending or trusted addresses (techniques termed black- and white-listing, respectively), searching for certain words or word patterns (e.g. refinancing, Viagra®, stock), and analyzing message headers. Feature extraction/matching methods are often used in conjunction with automated data classification methods (e.g., Bayesian filtering, neural networks).
Spam often arrives in a rapid succession of groups of similar messages also known as spam waves. The form and content of spam may change substantially from one spam wave to another. Since the efficiency of anti-spam methods generally decreases in time, successful detection may benefit from methods and systems capable of quickly recognizing and reacting to new spam waves.