The present invention relates to SPAM detection methods, and more particularly to intelligently detecting and removing SPAM.
The rapid increase in the number of users of electronic mail and the low cost of distributing electronic messages, for example, via the Internet and other communications networks has made mass marketing via electronic mail (xe2x80x9ce-mailxe2x80x9d) an attractive advertising medium. Consequently, e-mail is now frequently used as the medium for widespread marketing broadcasts of unsolicited messages to e-mail addresses, commonly known as xe2x80x9cSPAM.xe2x80x9d
Electronic mass marketers (also called xe2x80x9cspammersxe2x80x9d) use a variety of techniques for obtaining e-mail address lists. For example, marketers obtain e-mail addresses from postings on various Internet sites such as news group sites, chat room sites, or directory services sites, message board sites, mailing lists, and by identifying xe2x80x9cmailtoxe2x80x9d address links provided on web pages. Using these and other similar methods, electronic mass marketers may effectively obtain large numbers of mailing addresses, which become targets for their advertisements and other unsolicited messages.
Users of Internet services and electronic mail, however, are not eager to have their e-mail boxes filled with unsolicited e-mails. This is an increasing problem for Internet service providers (ISPs) such as America Online (AOL(copyright)) or Microsoft Network (MSN(copyright)) and other entities with easily identifiable e-mail addresses such as large corporations (e.g., IBM(copyright), Microsoft(copyright), General Motors(copyright), etc.). ISPs object to junk mail because it reduces their users"" satisfaction of their services. Corporations want to eliminate junk mail because it reduces worker productivity.
To date, the prior art has been devoid of mechanisms that can block SPAM effectively. Traditionally, SPAM detection has been based around specific rules for detecting it. Such rules include searching for key phrases in the subject headers, determining whether the recipient is actually on the list of users to receive the e-mail, etc.
More particularly, prior art systems rely on a set rule-base or on blocking based on simple mail fields such as sender, subject, mail body, etc. As Spammers become more creative in their mailings, it is increasingly difficult to block unwanted messages based on fixed information. Text search mechanisms in particular often generate a number of misses or xe2x80x9cfalsesxe2x80x9d due to the limitations of the searching mechanisms.
In particular, such text search mechanisms traditionally utilize static logic for locating particular known strings. This is often insufficient due to the dynamic manner in which spamming methods change over time. Thus, what is needed is a process for dynamically and intelligently detecting unwanted SPAM electronic mail messages.
A system, method and computer program product are provided for detecting an unwanted message. First, an electronic mail message is received. Text in the electronic mail message is decomposed. Statistics associated with the text are gathered using a statistical analyzer. A neural network engine coupled to the statistical analyzer is taught to recognize unwanted messages based on statistical indicators. The statistical indicators are analyzed utilizing the neural network engine for determining whether the electronic mail message is an unwanted message.
As mentioned above, the neural network engine can be taught to recognize unwanted messages. In one process of teaching the neural network, examples are provided to the neural network engine. The examples are of wanted messages and unwanted messages. Each of the examples is associated with a desired output. Each of the examples is processed with statistics by the neural network engine for generating weights for the statistics. Each of the weights is used to denote wanted and unwanted messages. Preferably, the neural network engine utilizes adaptive linear combination for adjusting the weights. Logic associated with the neural network engine is updated based on the processing by the neural network engine.
In another process for teaching the neural network engine, the neural network engine is updated to recognize an unwanted message. The message is identified as an unwanted message. The features of the message that make the message unwanted are identified, wherein the identified features are stored and used by the neural network to identify subsequent unwanted messages. Preferably, a graphical user interface is provided for allowing a user to identify the features of the message that make the message unwanted.
In another aspect of the present embodiment, the neural network engine uses artificial intelligence that analyzes previous user input for determining whether the message is unwanted.
A system, method and computer program product are also provided for teaching a neural network engine to recognize an unwanted message. Examples are provided to a neural network engine. The examples are of wanted messages and unwanted messages. Each of the examples is associated with a desired output. Each of the examples is processed with statistics for generating weights for the statistics. Each of the weights is used to denote wanted and unwanted messages. Logic associated with the neural network engine is updated based on the processing by the neural network engine.
In one aspect of the present embodiment, the neural network engine utilizes adaptive linear combination for adjusting the weights. In another aspect of the present embodiment, the neural network engine is updated to recognize an unwanted message. The message is identified as an unwanted message. The features of the message that make the message unwanted are identified. The identified features are stored and used by the neural network to identify subsequent unwanted messages. Preferably, a graphical user interface is provided for allowing a user to identify the features of the message that make the message unwanted.