The present disclosure relates generally to communication networks, and, more particularly, to methods, systems, and computer program products for detecting and responding to email address harvest attacks and associated spam attacks.
Email address harvesting is generally defined as a means of obtaining a list of valid email addresses associated with email domains with the purpose of using this list to address spam messages. An email address harvest attack is when a spammer attempts to obtain an email address list by connecting, via Simple Mail Transfer Protocol (SMTP), to an email domain and using an automated dictionary-name type of attack to check whether the email domain will accept delivery of email for a list of email addresses. Via trial and error, the spammer collects a list of valid email addresses. These attacks may be used to build initial lists as well as to refine and maintain existing harvested lists. Lists can also be obtained by purchasing them from other spammers or by systematic indexing web pages to obtain imbedded email addresses.
Each day thousands of SMTP mail systems controlled by spammers connect or attempt to connect to large Internet Service Providers (ISPs) to harvest email addresses. Some of the spammers just connect to the ISP domain, provide a list of “To” addresses, get the responses from the ISP, and disconnect. While other spammers connect, provide a list of “To” addresses, get responses, supply a message(s) to be delivered to the valid addresses, and disconnect. This sending of email may be used to obfuscate the attack or to get around defenses that the ISP domain has deployed to stop harvest attacks. In general, the ISPs accept the connections, validate the addresses submitted, and deliver the email, without knowing that a harvest attack has occurred. Also, harvest attacks may go unnoticed because ISPs are not focusing on them, and it is very hard to differentiate it from valid mail sessions.
ISPs, which focus on mitigating harvest attacks, typically focus their efforts first on trying to determine that a harvest attack is occurring. ISPs may do this by counting the number of failed address lookups during a SMTP session or the percent of failed address lookups during a session, and if the number surpasses the configured threshold, then a harvest attack is determined to be occurring. Also, sometimes a number and a percentage are jointly used to establish minimum thresholds. The thresholds may be kept high to reduce false positives that can occur when legitimate marketers send to address lists that may be out of date. As a result of the determination that a harvest attack is occurring, the SMTP session is dropped. ISPs may also attempt to block harvester mail systems by blocking the Internet Protocol (IP) addresses of the mail system of suspected harvesters. ISPs either obtain a harvest blacklist from a vendor, or they compile their own by analyzing “failed address lookups” across all SMTP sessions, arriving at a reputation for each mail system and establishing thresholds to be used to determine whether a particular mail system's IP address should be added to the harvest blacklist. To be effective at blocking future harvest attacks, the harvest block list may block connection attempts upfront prior to allowing responses from email address lookups. In many cases, the IP address blacklists that are used to block mail systems that are spamming are also used to block harvest attacks.
More specifically, the operation of the blocking and session dropping may be as follows: When an originating mail system attempts to connect, the originating IP address is checked against the harvest blacklist. If the IP address is on the harvest blacklist, the connection is rejected. If the connection is accepted, the SMTP session is begun, with the initial salutation (Ehlo/Helo) and the sender address (Mail From:) followed by the recipient addresses (RCPT To:) SMTP commands. As a recipient address is submitted, an address lookup is done to check whether the recipient address is valid. If the address is valid, the recipient mail system provides a valid response. If the address is invalid, the system recipient mail system provides an invalid response. The recipient mail system then counts the number of failed address lookups during the session and if the count does surpass the harvest attack threshold, the connection is dropped. If the count does not surpass the harvest attack threshold, the connection remains up awaiting the originating mail system to issue a disconnect request, or to accept a message and a disconnect request, or timeout.
A problem with such solutions is that spammers may easily execute harvest attacks that get past blacklists or failed address lookup thresholds. They may send from a vast number of different IP addresses that have no reputation and they may limit their failed address lookup attempts to keep under the threshold the ISP has established. They may modify their limits if they determine that the ISP has modified their thresholds. Also, spammers may include a simple message as part of their harvest attack so as to blend in with normal message traffic. Spammers may also send from as many different IP addresses to generate as much volume of lookups as needed to complete their desired level of maintenance of their harvest lists. If some of their harvest attack sessions are dropped or their IP addresses are blocked from executing a harvest attack, spammers may just execute additional harvest attacks till they reach the desired level. From experience, it appears that spammers establish levels based on a daily basis (e.g. check X addresses per day, or check Y successful address lookups per day). As a result, ISPs may constantly update their blacklists with the hope that it will mitigate the next attack. ISPs typically do not modify their thresholds often because in many cases it may result in bringing down the mail application to update the configuration, which may causes availability or resource constraints. Moreover, lowering the thresholds may cause false positives and associated complaint calls. Given that many of the harvest attacks are used to refine or maintain already harvested lists, these attacks may not identified because during these attacks many addresses are valid and the number of failed address lookups are normally below the threshold an ISP would set for determining that a harvest attack is occurring.
The effectiveness of current harvest mitigation techniques may be limited because it is generally very hard to mitigate attacks after the fact. Over the last several years, a larger and larger proportion of an ISP's email addresses may have been harvested. It is common for large ISPs to receive harvest attacks that check millions of addresses each day for spammers to add new addresses of new accounts and remove old addresses whose accounts are no longer active. Once addresses are harvested, they are typically then used in spam attacks. As a result, it has gotten to the point, where sometimes upwards of 100% of all addresses used in a large spam attack, which may be addressed to tens of millions of recipients, are valid. Spammers are continuously trying to obtain a higher proportion of valid addresses for a particular domain.
As a result of being unsure of the sender's identity, the lack of reputation, the inability to identify harvest attacks, and spammers' effectiveness of obfuscation these attacks, ISPs have generally had a hard time of improving the effectiveness of their current email address harvest mitigation processes. As a consequence, many email address harvest attacks go along unnoticed, unabated, and directly support spammers' ability to increase delivery of spam into ISP members' mailboxes and adversely affect their experience with using email. In addition, from an ISP's prospective, email address harvest attacks have greatly affected the cost of providing email service.