Spam classification engines analyze the content of email messages and attempt to determine which emails are spam based on various statistical techniques, such as Bayesian analysis. Bayesian spam filtering is based on established probabilities of specific words appearing in spam or legitimate email. For example, nonsense words, as well as certain words such as “Viagra”, “Refinance”, “Mortgage” etc, frequently appear in spam, and yet rarely or less frequently appear in legitimate email. Thus, the presence of such terms increases the probability of an email being spam. A Bayesian spam classification engine has no inherent knowledge of these probabilities, but instead establishes them by being trained on a set of email messages.
Unlike other statistical classification methods, predicting error rates (rates of false positives and false negatives during classification) for a Bayesian classifier is very difficult. Tuning a Bayesian engine to produce predicable error rates is even more difficult. When classifying email as spam or non-spam, individual users have varying tolerance levels for false positive and false negative rates. Existing Bayesian classifiers simply utilize a single, non-changeable value designed to please everybody. However, many users find that this value does not produce the results that they desire, and consequently would like to be able to adjust change the error rate.
It would be desirable to be able to allow users to adjust the error rate for a Bayesian classifier up and down as desired, thereby allowing users to customize the error rate to their personal tolerance level.