Electronic messaging, and particularly electronic mail (email), is increasingly used as a means for disseminating unwanted advertisements and promotions (often denoted as “spam”) to network users. Email can also be misused in malicious attacks, such sending a large volume of email to an address in a denial-of-service attack, and in attempts to acquire sensitive information in phishing attacks.
Common techniques utilized to thwart spam and malicious emails involve the employment of filtering systems. In one filtering technique, data is extracted from the content of two classes of example messages (e.g., spam and non-spam messages), and a filter is applied to discriminate probabilistically between the two classes, such types of filters are commonly referred to as “content-based filters.” These types of machine learning filters usually employ exact match techniques to detect and distinguish spam messages from good messages.
Spammers and malicious email creators can fool conventional content-based filters by modifying their spam messages to look like good messages or to include a variety of erroneous characters throughout the message to avoid and/or confuse character recognition systems. Thus, such conventional filters provide limited protection against spam and malicious messages.
In other techniques, a Domain Name System (DNS) blackhole list (DNSBL) or a real-time blackhole list (RBL) may be referenced to identify IP addresses that are reputed to send electronic mail spam. Email servers can be configured to reject or flag messages that are sent from a site listed on these lists. Unfortunately, these lists may block legitimate emails along with spam that is sent from shared email servers, and it can be difficult to remove legitimate addresses from the lists.