The invention relates to methods and systems for classifying electronic communications, and in particular to systems and methods for filtering unsolicited commercial electronic mail (spam).
Unsolicited commercial electronic communications have been placing an increasing burden on the users and infrastructure of electronic mail (email), computer messaging, and phone messaging systems. Unsolicited commercial communications, also known as spam, forms a significant percentage of all email traffic worldwide. Spam takes up valuable network resources, affects office productivity, and is considered annoying, intrusive, and even offensive by many computer users.
Software running on an email user's or email service provider's system may be used to classify email messages as spam or non-spam (also called ham). Current methods of spam identification include matching the message's originating address to lists of known offending or trusted addresses (techniques termed black- and white-listing, respectively), and searching for certain words or word patterns (e.g. refinancing, Viagra®, weight loss).
Spammers constantly develop countermeasures to such anti-spam methods, which include misspelling certain words (e.g. Vlagra), using digital images instead of words, and inserting unrelated text in spam messages (also called Bayes poison). Spam identification may be further complicated by frequent changes in the form and content of spam messages.
Some spam identification systems use neural network filters. Neural network filters can be trained to learn a set of frequently occurring patterns of spam identifying features (e.g., the presence of certain keywords within the message body), and are able to pro-actively combine multiple spam heuristics. In some instances, neural network training processes and associated filtering may produce sub-optimal spam-identification results.