The receipt of unsolicited electronic mail (“e-mail”) messages (“spam”) has become a nuisance for networked computer users. In response, numerous techniques have been developed to detect spam and prevent it from being delivered to its intended destination. Several known methods of filtering spam are based upon detecting and deleting duplicate copies of a spam message. For example, one known method of filtering spam from more legitimate messages is called Filtering by Duplicate Detection (FDD). A sender of spam (a “spammer”) typically does not know if two or more addresses on his address list point to the same mailbox. The FDD method creates and maintains two or more e-mail addresses that point to the same mailbox. Whenever the same message is received more than once, it is determined to be spam and is deleted. Additionally, information from the spam can be stored (e.g., in a database) for use in identifying other spam (e.g., e-mail from the same sender, with the same subject line, etc.)
Another known method of filtering spam is called Collaborative Filtering (CF). In the CF method, many users work together to maintain a central repository of received spam messages and all users' mail software checks this repository to see if a given message is in it; if so, the message is deleted from the user's mail box. The power of CF stems from its automatic detection of duplicate messages by the user's e-mail client software comparing each newly arrived message with the list of spam messages maintained at the central server.
A third method, Manual Filtering (MF), is the most widely used method in the Internet today. Users of MF read all or part of each message and determine whether it is spam. Due to properties of the human visual and cognitive system, MF users can more easily and quickly detect a copy of a previously seen message than they can determine whether a message is spam. Thus, MF users also benefit from duplicate detection through increased efficiency.
Existing approaches to solving the spam problem further include rule-based filtering, cryptographic authentication. See RSA Data Security; “S/MIME Central”; http://raw.rsa.com/smime/; and S. Garfinkel; PGP: Pretty Good Privacy; Sebastopol, Calif.: O'Reilly and Assoc; 1995. Various sendmail enhancements have also been proposed and implemented. See B. Costales, E. Allman, & N. Rickert; Sendmail; Sebastopol, Calif.: O'Reilly and Assoc; 1993. See http://˜.sendmail.org/ for the latest enhancements; and see email channels in R. J. Hall; How to avoid unwanted email; Comm. ACM 41(S′), 88-95, March 1998. These techniques are all of varying levels of effectiveness, applicability, and practicality. For surveys of anti-spam technology, see L. Cranor, B. LaMacchia; Spam!; to appear in Comm. ACM, 1998.http://wwu.research.att.com/˜lorrie/pubs/spam!; and R. J. Hall; How to avoid unwanted email; Comm. ACM 41(S′), 88-95, March 1998.
Summarizing, in FDD, the idea is to maintain and publicly distribute two (or more) email addresses, both forwarding to the same mailbox. An email software agent then automatically deletes any messages that are received more than once. It gets its power from the fact that spammers (originators of spam) have no general way of telling when two addresses they have culled from newsgroups, web sites, etc, point to the same mailbox. In CF, the idea is that a group of email users establishes a central server that maintains a list of known spam messages; each time a new spam message is received (and recognized as such) by some user, that user adds it to the server's list. Then, each user employs agent software that screens out any message appearing on the server's list. Even MF, where the user reads and recognizes spam messages himself, benefits from duplicate detection, because spammers often send messages many times to the same list; the attentive MF user will more quickly delete second and succeeding copies, due to the power of human visual pattern recognition.