Electronic communications including, for example, electronic mail (e-mail), instant messaging, chat, text messaging, short message service (SMS), pager communications, blog posts, news items etc., pervade all aspects of our lives. The explosive growth of electronic content items has created an acute need for methods that allow users to quickly identify content items related to a topic of their choosing. The widespread use of electronic communications has also spurred its misuse. For example, users of electronic communications continue to receive a barrage of unsolicited or unwanted communication. Such electronic communication, termed spam, includes unsolicited, unwanted, or duplicative communication, or electronic junk mail, which is usually sent in large quantities indiscriminately to a large number of recipients. Spam may contain unwanted advertising, solicitation, inappropriate content, malicious content, abusive content etc.
A spammer, responsible for sending spam communications, has little to no operating costs other than those required to manage mailing lists. As a result, the volume of spam has increased exponentially. Most spam consists of harmless advertising, although, recently spammers have used spam for malicious purposes like collecting a user's personal information and spreading computer viruses. Regardless of its use, spam is annoying to users because of its undesirable content and sheer volume.
Over the years, techniques have been proposed to identify and filter spam communications. Most of these proposed techniques rely on algorithms based on machine learning, for example, naive bayes, and logistic regression. These techniques, however, suffer from significant drawbacks. For example, these techniques are slow and take too much time to determine whether an incoming electronic communication is spam. In addition, they may result in a high false positive rate by erroneously classifying good electronic communications as spam.
In view of the above drawbacks, there is a need for improved systems and methods for identifying spam communications. There is also a need for improved systems and methods for identifying spam communications that are more efficient and less prone to provide erroneous classifications or high false positive rates. In addition, there is need for improved methods of identifying content items corresponding to a user query.