The present invention relates to filtering, and more particularly to filtering unwanted electronic mail messages.
The Internet is growing in popularity, and more and more people are conducting business over the Internet, advertising their products and services by generating and sending electronic mass mailings. This electronic mail (e-mail) is usually unsolicited and regarded as nuisances by the recipients because they occupy much of the storage space needed for the necessary and important data processing.
For example, a mail server may have to reject accepting an important and/or desired e-mail when its storage capacity is filled to the maximum with the unwanted e-mail containing advertisements. Moreover, thin client systems such as set top boxes, PDA""s, network computers, and pagers all have limited storage capacity. Unwanted mail in any one of such systems can tie up a finite resource for the user. In addition, a typical user wastes time by downloading voluminous but useless advertisement information. Unwanted mail typically slows down users by forcing the mail to be downloaded when the mail is delivered. Because this type of mail is so undesirable, it has acquired a special name in the Internet community, spam.
Therefore, it is highly desirable to have a filter system for screening and turning away unwanted mails, while the desired e-mails pass through the system to reach the recipients. Presently, there are products that are capable of filtering out unwanted messages.
For example, a spam block method exists which keeps an index list of all spam agents, i.e., companies that generate mass unsolicited e-mails, and provides means to block any e-mail sent from a company on the list.
Another xe2x80x9cjunk mailxe2x80x9d filter currently available employs filters which are based on predefined words and patterns as mentioned above. An incoming mail is designated as an unwanted mail, if the subject contains a known spam pattern.
Yet another e-mail service forwards all incoming e-mail to another address, filtering spam sender addresses. A masterjunk mail file is used to filter incoming e-mail against a list of known xe2x80x9cspammers.xe2x80x9d In addition, a custom filter which is defined by a user may also be employed as a double filter to discard any unwanted e-mail.
While many of these and other techniques are being currently being used in separate products, there is a continuing need for improved anti-spain models capable of filtering more effectively.
A system, method and computer program product are provided for filtering unwanted electronic mail messages. After receiving electronic mail messages, the electronic mail messages that are unwanted are filtered utilizing a combination of techniques including: compound filters, paragraph hashing, and Bayes rules. The electronic mail messages that are filtered as being unwanted are then categorized.
In one embodiment, the compound filters may utilize Boolean logic. Still yet, the compound filters may utilize conditional logic. As an option, the compound filters may each have a level associated therewith. Thus, the compound filters having a higher level associated therewith may be applied to the electronic mail messages prior to the compound filters having a lower level associated therewith.
In another embodiment, content of the electronic mail messages may be normalized prior to utilizing the paragraph hashing. Such normalizing may include removing punctuation of the content, normalizing a font of the content, and/or normalizing a case of the content.
In still another embodiment, the paragraph hashing may exclude a first and last paragraph of content of the electronic mail messages. Still yet, the paragraph hashing may utilize an MD5 algorithm. In use, the electronic mail messages may be filtered as being unwanted upon results of the paragraph hashing matching that of known unwanted electronic mail messages.
In one aspect of the present embodiment, the hashes of known unwanted electronic mail messages may each have a level associated therewith. Thus, the hashes having a higher level associated therewith may be applied to the electronic mail messages prior to the hashes having a lower level associated therewith.
In another embodiment, the utilization of the Bayes rules may occur after the utilization of the compound filters and the paragraph hashing. Furthermore, the utilization of the Bayes rules may include identifying words of the electronic mail messages. Still yet, this may further include identifying a probability associated with each of the words. Optionally, the probability associated with each of the words may be identified using a Bayes rules database.
Thus, during use of the present embodiment, the electronic mail messages may be filtered as being unwanted based on a comparison involving the probability and a threshold. As an option, the threshold is user-defined.
Optionally, the electronic mail messages that are filtered as being unwanted may be categorized in a plurality of categories. Such categories may include pornographic, violent, language, etc. Moreover, the electronic mail messages that are not filtered may be displayed via an electronic mail message manager. dr
FIG. 1 illustrates a network architecture, in accordance with one embodiment.
FIG. 2 shows a representative hardware environment that may be associated with the data server computers and/or end user computers of FIG. 1, in accordance with one embodiment.
FIG. 3 illustrates a system adapted for filtering unwanted electronic mail messages, in accordance with one embodiment.
FIG. 4 illustrates a system adapted for filtering unwanted electronic mail messages involving multiple computers, in accordance with another embodiment.
FIG. 5 illustrates a system adapted for filtering unwanted electronic mail messages involving multiple users on a single computer, in accordance with still another embodiment.
FIG. 6 illustrates an exemplary anti-spam server including a front-end and a back-end adapted for filtering unwanted electronic mail messages, in accordance with one embodiment.
FIG. 7 illustrates a method for filtering unwanted electronic mail messages, in accordance with one embodiment.
FIG. 8 illustrates a system for filtering unwanted electronic mail messages, in accordance with one embodiment.
FIG. 9 illustrates a method for filtering unwanted electronic mail messages, in accordance with one embodiment.