As the use of electronic communications has become increasingly popular for both personal purposes and work related purposes, more marketers send spams, to advertise their products and/or services. As used herein, the term “spam” refers to electronic communication that is not requested and/or is non-consensual. Also known as “unsolicited commercial e-mail” (UCE), “unsolicited bulk e-mail” (UBE), “gray mail” and just plain “junk mail,” spam is typically used to advertise products. The term “electronic communication” as used herein is to be interpreted broadly to include any type of electronic communication or message including voice mail communications, short message service (SMS) communications, multimedia messaging service (MMS) communications, facsimile communications, etc.
However, the mass distribution of spams causes many users not only nuisance, but costly problems as well. Spams clutter the inboxes of users, who has to manually go through the incoming electronic communications to separate the unsolicited communications from other legitimate communications. Furthermore, spams generate massive amount of useless traffic in the electronic communication networked system of many companies, which at best, may slow down the delivery of important communications; at worst, may crash the networked systems of the companies.
A current way to screen electronic communications is to analyze the content of incoming electronic communications. Existing software analyzes the message body of incoming electronic communications to generate a number of fingerprints or signatures. The message body of a spam typically contains a marketing message of the spam sender, who is also known as a spammer. However, the spammer may randomly make minor modification in the body of the spam such that the fingerprints generated may not recognize the modified spam. Therefore, another existing way to screen electronic communications for spams applies the similarity algorithm to catch electronic communications having content substantially similar to the content of a previously identified spam.
However, such content-based screening processes are not typically satisfactory because a spammer may randomize the contents of the spams to defeat these screening processes. For example, some spams are littered with random junk to avoid detection by the existing content-based screening processes.