The invention relates to methods and systems for classifying electronic communications, and in particular to systems and methods for filtering unsolicited commercial electronic messages (spam).
Unsolicited commercial electronic communications have been placing an increasing burden on the users and infrastructure of electronic mail (email), instant messaging, and phone text messaging systems. Unsolicited commercial email, commonly termed spam or junk email, forms a significant percentage of all email traffic worldwide. Email spam takes up valuable network resources, affects office productivity, and is considered annoying and intrusive by many computer users.
Software running on an email user's or email service provider's system may be used to classify email messages as spam or non-spam. Several approaches have been proposed for identifying spam messages, including matching the message's originating address to lists of known offending or trusted addresses (techniques termed black- and white-listing, respectively), searching for certain words or word patterns (e.g., Viagra®, weight loss, aggressive buy), and analyzing message headers.
Experienced spammers have developed countermeasures to such classification tools, such as misspelling certain words (e.g., Vlagra), inserting unrelated text in spam messages, and using digital images of words or phrases instead of actual text The efficiency of existing spam detection methods often decreases in time, since the form and content of spam messages change rapidly. As spammer countermeasures become increasingly complex, successful detection may benefit from increasingly sophisticated identification techniques.