Electronic mail or email was historically one of the very first applications of the Internet. Internet users are now facing a growing problem: Unsolicited Mass Email (UME). UME is defined as email messages that are sent in very large quantities to as many recipients as possible, regardless of the desire of the recipients to receive these messages. The senders of UME, hereafter called mass-mailers, typically ignore requests to be removed from future mass mailings.
Mass-mailers collect email addresses from several sources, including:
1. The Internet: recipients often post their addresses on web sites or in online forums, where mass-mailers collect them with special tools dubbed “address harvesters”.
2. Promotional web sites: recipients are enticed to enter their email address for a chance to win a small prize, if any, and are thereafter included in mass-mailers' address lists.
3. Legitimate contact lists: some companies periodically sell name and email addresses of potential prospects. Address and contact lists are also part of the assets that are liquidated when a company goes bankrupt. Mass-mailers buy these email addresses and resell them to other mass-mailers.
Once an email address has been included in a list used by a mass-mailer, it is often quickly resold and used by many other mass-mailers. The recipient will receive an ever-larger amount of UME. Such an email address is said to be compromised.
Protecting one's email address from being compromised through methods 1 and 2 is relatively easy (i.e., don't post your email address and don't give it to unknown entities). However, method 3 is impossible to avoid with the current email system. Corporate workers must often give their email address to contacts, suppliers, customers and other entities or persons outside their company, each of which can potentially disseminate these addresses and add them to mass-mailing lists.
Current solutions to the UME problem are unsatisfactory and often impractical for the business world, if not for most email users. These solutions include the following.
1. Filtering known UME senders. This doesn't work well because mass-mailers often forge their sender address (i.e., the “From:” field).
2. Filtering all unknown senders (a.k.a. “white listing”). This is not practical because business recipients often receive emails from new contacts, for example after initiating communication across the telephone or in person.
3. UME detection (a.k.a “message matching”). This method employs several email addresses that are posted on the Internet for the express purpose of being harvested by mass-mailers to be compromised. The idea is that if these email addresses receive a message, that message will be UME, and all similar email messages can be tagged as UME. However, mass-mailers are sprinkling messages with random parts, adding or changing character strings in individual messages, which can defeat message matching systems.
4. Filtering on content. Most UME contain “trigger words” that can be detected by filtering software. For instance, “mortgage” is unlikely to show up in your professional email if you aren't in the real estate business, but it is frequent in UME messages. This filtering can be quite efficient. However, recent trends in UME show that mass-mailers avoid trigger words by misspelling or altering them (e.g., “m0rtgage” or m:or.t.gage” instead of “mortgage”), which decreases the filters' efficiency. Other mass-mailers fool naive filtering software by inserting comments within HTML messages to break trigger words (e.g., “mo<!--ZZZZ-->rtgage”). Note that this insertion of useless strings in the UME messages also tends to increase their average size.
5. Adaptative filtering. Adaptative filters can be taught to recognize the format and layout of UME messages, which often rely on HTML formatting with several images. However, legitimate emails containing genuine press releases and newsletters are now as likely to be filtered as UME. Besides, mass-mailers have started sending UME with Javascript encoding as well as UME entirely composed of one or more images, which cannot be filtered on content. These Javascript-encoded and image-based UME message are of ever-increasing sizes.
Other problems with existing anti-UME solutions include:
1. Client-location filtering is not a good solution. When filtering is done on the client side, UME is sent to the recipient's machine, only to be discarded by the recipient's mail agent. Meanwhile, the network connection of the recipient's machine is clogged by UME. When the recipient must download email using a slow dial-up connection (e.g., when the recipient is away from a corporate office equipped with high-speed networking), the time wasted downloading UME can be significant.
2. Wireless devices. A growing number of portable devices offer to receive email wirelessly. The service providers generally sell wireless connectivity by the hour or by the megabyte. When an email address connected to one of these devices is compromised, it is a real problem because the recipient must then download UME on a slow and expensive connection. Even if an UME message is identified as such, it still has to be downloaded.
In summary, existing filtering systems typically require emails to be downloaded and processed. A lot of corporate filtering systems are server based, but they merely identify UME messages and tag them as such, clogging the mailbox and network connection of the corporate recipient.