The growth of the Internet in recent years has changed the way people communicate. For example, people often use one or more of email, Instant Messaging (IM), Short Messaging Services (SMS), chat rooms, and the like. People are employing such communication messaging services on desktops, laptops, and as well as mobile phones.
Along with the increased use of the Internet for communications has also come an increased use of email and other messaging for mass marketing. This form of marketing has become an attractive advertising mechanism for individuals, businesses, and the like, because it enables them to reach a large audience at a minimal cost. However, the use of messaging in this manner is often problematic for, and undesired by the recipients. Hence, a general term, spam, has arisen to describe these unsolicited messages. For certain types of messaging environments, an analogous term is used. For example, unsolicited messages in instant messaging (IM) environments are sometimes referred to as spim.
Such activities as spam, spim, and other forms of unsolicited messages have resulted in many people becoming more frustrated with their service providers and with communicating over public networks in general. Users often expect their service providers, or others, to protect them from such abuses.
Some service providers use a message filtering system to screen out as many unsolicited messages as possible without screening out valid messages. One filtering technique uses a Bayesian learning system to learn to recognize spam based messages identified by a user as spam. A Bayesian classifying system generally consists of a Bayesian algorithm and training information from each user corresponding to classification selections made by each user to identify valid and unsolicited messages (e.g., a vote that a message is-spam or is not-spam). The training information is used with the algorithm to make classification decisions on subsequent messages before the user sees the messages. To prevent false classification of messages, a message may be classified as “unsure” by the Bayesian classifier. This typically happens in cases where sufficient statistics are unavailable in the user's training database.
Due to the lack of training statistics in the user's database, often a large percentage of messages presented to the Bayesian classifier are classified as “unsure.” This leads to wasting valuable computing resources such as bandwidth, server CPU time, and the like. Classifying a message as “unsure” also adds no information helpful in final message classification. Thus, it is with respect to these considerations, and others, that the present invention was made.