Electronic networks provide a vast capability to communicate electronically. The list of electronic network communication methods such as email, SMS, voice-over-IP, and others continue to expand as this resource continues to mature. Along with the inherent advantages of electronic communication has been the bane of unwanted or unsolicited messages. Virtually every user of electronic mail is a target of unsolicited bulk email, often referred to as junk email, unsolicited bulk email (“UBE”), unsolicited commercial email (“UCE”), or spam (hereafter referred to collectively as unsolicited bulk electronic messages). As technology has evolved, methods have been developed to control these undesired messages. Senders of bulk messages, however, have responded by utilizing various techniques to obscure the source of the UBE they send in order to avoid the identification that would lead to limits on their activities. Conventionally, electronic messages include a header section that includes multiple required and optional lines of information, including the source and destination addresses of a message. Typical required lines include From, To, Message-ID, and X-Mailer. Optional lines include Reply-To, Organization, and Return-Path. Additional required header lines, denominated Received, are added to the message as the message passes through the mailer sub-systems of Internet Service Providers (“ISPs”) and other computer systems as necessary to reach a destination domain user. These Received lines are nominally beyond the control of individuals creating bulk messages.
Senders of bulk messages use anonymous mailers and redistributors to obscure the required email header lines of their UBE by specifying non-existent email systems and accounts. Modified mailers can be used to remove header lines completely or to substitute addresses of known valid electronic message accounts that are not actually associated with the sender of the bulk message.
These techniques are generally sufficient to prevent the UBE recipients from being able to identify and complain to the postmaster of the relevant ISP about the activities of an individual or company sending bulk messages. While the Received lines provide traceable information, the complexity of filtering through this information is usually beyond the level of effort that most UBE recipients will undertake. Even for those that do, the number of public complaints actually received by the ISP is significantly lower than the amount of bulk messages that transit the ISP, often allowing the sender of bulk messages to remain in operation for a significant length of time before being forced to find a new ISP to use for their activities.
Many techniques have been developed in the recent past to deal with the growing amount of UBE being received by network users. These techniques primarily include email client systems supporting manual email accept and reject lists, automated context analysis, use of public shared lists of known spam sources, and direct challenge systems. None is completely effective and all impose an additional degree of operating complexity on the email client user to varying degrees.
For instance, some email servers utilize a protocol whereby every email is examined for specific language that would indicate the email is undesirable (such as “sex” or “make money”). This can be a problem when the email must be opened (which may trigger a virus) and, in any event, requires processing power which has an attendant cost to the organization operating the server. There have also been attempts at heuristic protocols for examining the emails. Heuristic protocols attempt to examine the contents or other information contained in the message. These approaches also cause a delay in the delivery of email as the message is examined, particularly at a large organization which may receive a considerable amount of messages in a short period.
An alternative to examining the entire message is calculating a hash value of the contents or header fields. The hash value can be stored and compared to subsequent messages. When the values match, the existence of UBE is concluded and future messages with the same hash value are blocked. The advantage to such a technique is the relatively low processing time required to perform a hash function, but the technique does have limitations. Only exact copies can be identified. Any modification of the message, however slight, will result in the bulk message not being recognized.
Another technique to discover UBE is context analysis. Automated context analysis relies on key word usage and various patterns of advertisement pitches to discern UBE from other email. Suspected UBE is automatically discarded or, more typically, directed to an alternate email in-box of the client. Since the analysis is not, and as a practical matter cannot, be perfect, desired email may be wrongly characterized. Therefore, the user is generally required to review manually the messages in the alternate email in-box anyway. Thus, this technique functions only as an imperfect segregating filter against UBE, rather than a blocking filter.
There remains a clear need for an efficient method to identify unsolicited bulk electronic messages. It would be desirable to identify bulk electronic messages before their arrival at a client site despite minor alterations in either the message's header or content. The present invention addresses these and other problems, as well as provides additional benefits.