Classifiers based on machine learning techniques may be used to classify data items of unknown classification. One area in which such classifiers may be used is in filtering mass, unsolicited, commercial e-mailings (colloquially known as e-mail spam or spam e-mail). Spam e-mail is akin to junk mail sent through the postal service. However, because spam e-mail requires neither paper nor postage, the costs incurred by the sender of spam e-mail are quite low when compared to the costs incurred by conventional junk mail senders. Consequently, e-mail users now receive a significant amount of spam e-mail on a daily basis.
Spam e-mail impacts both e-mail users and e-mail providers. For e-mail users, spam e-mail can be disruptive, annoying, and time consuming. For e-mail and network service providers, spam e-mail represents tangible costs in terms of storage and bandwidth usage, which costs are not negligible due to the large number of spam e-mails being sent.