The present invention relates to methodologies for detecting patterns in received e-mail messages on a computer system. In particular, the present invention describes a method for detecting undesired e-mail usage based on the pattern of received e-mail messages on a computer system.
Detection of an undesired pattern of e-mail messages is the first step in reducing or eliminating the volume of undesired e-mail messages received by a computer system or server. Once detection is accomplished, a policy can be set on the computer system or server to filter out the sources or types of e-mail messages.
Traditional methods for reducing or eliminating the volume of undesired e-mail messages focus on filtering techniques once sources or types of undesired e-mail have been identified. These methods assume that sources or types of undesired e-mail are given, and/or identify sources or types on a per-e-mail basis. For example, one per-e-mail basis determination technique involves analyzing the header of a particular e-mail to determine whether that e-mail was sent in a way that hides its true origin. A filter may then be used in conjunction with this technique to disregard any e-mail message hiding its true origin.
Format of Internet E-mail Messages
The 821 Header
The 821 header is a header which is attached to e-mail messages and which contains routing information for the e-mail. The 821 header contains commands and replies sent before transmission of the e-mail message at the Simple Mail Transfer Protocol (xe2x80x9cSMTPxe2x80x9d) level.
SMTP is based on a model of communication in which, as a result of a user e-mail request, a sender-SMTP establishes a two-way transmission channel with a receiver-SMTP. The receiver-SMTP may be the ultimate destination, or just an intermediary. In the transmission channel, SMTP commands are generated by the sender-SMTP and sent to the receiver-SMTP. The receiver-SMTP sends SMTP replies to the sender-SMTP in response to these commands.
In a typical exchange between the sender-SMTP and the receiver-SMTP, the sender-SMTP will send a xe2x80x9cMAIL FROMxe2x80x9d command indicating the sender of the e-mail. The receiver-SMTP will respond with an xe2x80x9cOKxe2x80x9d reply, if it can accept the e-mail. The sender-SMTP will then send a xe2x80x9cRCPTxe2x80x9d command, which identifies the recipient of the e-mail. If the receiver-SMTP can accept the e-mail message for that recipient, it will respond with an xe2x80x9cOKxe2x80x9d reply; if not, it will respond with a reply rejecting that recipient. Other recipients may then be negotiated. After all recipients have been negotiated, the sender-SMTP will send the data constituting the e-mail message. If the receiver-SMTP successfully receives the e-mail data, it will respond with an xe2x80x9cOKxe2x80x9d reply.
The command and reply sequence in the transmission channel will be part of an e-mail which is successfully transmitted, forming the 821 header for that e-mail message. This header will be comprised of fields of text, where each field represents a command or reply in the sequence. Additional details on SMTP commands and format can be found in the Internet standard document xe2x80x9cRequests for Comments #821, Simple Mail Transfer Protocol,xe2x80x9d Jonathan B. Postel (1982).
The 822 Header
Text messages sent by e-mail may be viewed as having an envelope and contents. The contents of an e-mail text message comprise the data sought to be conveyed to the recipient. The envelope contains information needed to accomplish transmission and delivery of the contents. This envelope is comprised of a header and fields within the header, where each field contains two sub-fields, a field-name and a field-body. The field-name specifies the name of the field, whereas the field-body contains the content of that field for that e-mail message.
The header which is a part of the e-mail message (xe2x80x9cthe 822 headerxe2x80x9d) is different from and in addition to the 821 header discussed earlier. The 821 header is used for mail routing, whereas the 822 header contains envelope information for an e-mail subscriber.
Typical 822 header fields include a xe2x80x9ctoxe2x80x9d field containing the e-mail address of the receiving subscriber, xe2x80x9cccxe2x80x9d and xe2x80x9cbccxe2x80x9d fields containing addresses of subscribers to which copies of the e-mail message are sent, a xe2x80x9csubjectxe2x80x9d field which may include a sending-subscriber text string identifying the subject of the e-mail message, and other fields. Formatting, and additional details of the 822 header are discussed in the Internet standard document xe2x80x9cRequest for Comments #822, Standard for the Format of ARPA Internet Messages,xe2x80x9d David H. Crocker (1982).
What constitutes undesired e-mail usage may vary depending on the e-mail policies implemented on a specific computer system or server. One general characteristic is that a large number of unwanted e-mail messages which tax system resources is usually produced as a consequence of such undesired usage. Undesired e-mail is not always generated by a malicious user; such e-mail may be generated unintentionally by users or even systems. For purposes of illustration, the following examples will assume that the environment comprises a wireless telephonic service provider, a gateway operated by the wireless telephonic service provider (the mobile device gateway), remote gateways not part of the wireless telephonic service provider and subscribers with mobile devices capable of communicating with the gateways operated by the wireless telephonic service provider through the remote gateways. In the following examples, the point of view of the wireless telephonic service provider is taken in considering what constitutes undesired e-mail usage.
A subscriber put in place an automatic notification system which sent e-mail to his/her mobile device when his/her system was unreachable by his/her monitoring system. The system had a failure which caused this monitoring check to trigger and send e-mail messages stating that the system was unreachable. Unfortunately, this caused thousands of e-mail messages to be sent in a short amount of time to the mobile device gateway. These e-mail messages contained the same information. Such e-mail usage is undesirable.
Similar to example 1, except that the monitoring trigger was based on the status of a database instead of whether the system was reachable.
A system administrator frequently mailed information on the health of the system to a number of mobile users, regardless of the condition of the system. The average e-mail message load was approximately 600 messages per hour. Such e-mail usage is undesirable.
A paging service was unable to use the blind copy feature to copy several recipients on an e-mail message. Therefore, the service sent the same e-mail message to these recipients, one at a time. Such e-mail usage could be deemed undesirable.
An e-mail message may be relayed through a gateway although the e-mail message is not destined to or sourced from that gateway. Consequently, a system attached to the gateway may be used contrary to its designated purpose. Use of the system by such e-mail is undesirable.
Mail bombing comprises sending continuous e-mail messages to a destination from one or several sources. It is an unacceptable attempt to disable an e-mail system or e-mail account. Such use of e-mail is undesirable.
Some invaders of a system may attempt to pipe commands for execution through an e-mail server. For example, invaders have attempted to use e-mail servers to pipe unauthorized Telnet sessions out from the system for their use. Such use of e-mail is undesirable.
An unsolicited e-mail message was sent to subscribers of a wireless system, using a number generator that incremented the user field. This caused a number of e-mail messages with the same content to be sent to subscribers. Such e-mail usage is undesirable.
Known Solutions
Most of the work in reducing or eliminating undesired e-mail has been performed in the area of filtering.
Origin-Based Heuristic Filtering
Heuristic filtering presumes that an undesired e-mail can be detected without the cooperation of the originator. In origin-based heuristic filtering, entire groups of originators of undesired e-mail, usually everyone from a particular Internet Service Provider (xe2x80x9cISPxe2x80x9d) or a particular domain, are distinguished from other users. Origin filtering prevents e-mail from such groups from being saved in the destination host""s message store. This method relies on originators using the same or similar addresses each time they send undesired e-mail. Origin-based heuristic filtering can be implemented in various ways:
(i) One implementation of origin-based heuristic filtering may be performed at the Internet Protocol (xe2x80x9cIPxe2x80x9d) layer. In this implementation, routers at the local site are instructed to not route IP packets from a list of addresses corresponding to known originators of undesired e-mail. This implementation assumes that the identities of originators are already known.
(ii) A second implementation of origin-based heuristic filtering may be performed at the Transmission Control Protocol (xe2x80x9cTCPxe2x80x9d) layer. In this implementation, a Simple Mail Transfer Protocol (xe2x80x9cSMTPxe2x80x9d) server is configured to look up the IP address or domain name of an originator as it connects to the SMTP server. If the originator is on a list of known, prohibited sites, the SMTP server can refuse to accept any SMTP commands. This implementation of filtering is performed immediately after the TCP connection is opened, before any SMTP commands are exchanged. This implementation assumes that the identities of originators are already known.
(iii) A third implementation of origin-based heuristic filtering may be performed at the SMTP layer. In this implementation, the receiving SMTP server can check the domain name of a sending SMTP server during execution of the SMTP xe2x80x9cMAIL FROMxe2x80x9d command. The receiving SMTP server can refuse to receive a message if the domain name is on a list of prohibited sites. This filtering is performed before any message text is transmitted. This implementation assumes that the identities of originators are already known.
(iv) A fourth implementation of origin-based heuristic filtering may be performed on an e-mail message by comparing the IP address specified in the SMTP xe2x80x9cMAIL FROMxe2x80x9d command with the IP address of the TCP connection. If the two IP addresses don""t match, the SMTP server can refuse to receive the message. This implementation determines the identity of the originator of undesired e-mail on a per-e-mail basis.
Message-Based Heuristic Filtering
Message-based heuristic filtering attempts to identify undesired e-mail by analyzing segments of the received e-mail message such as special content, headers, addressing style, and sender address. This type of filtering may occur in the message store before the recipient has retrieved it, or in the recipient""s mail client as the recipient retrieves the message. This implementation determines the identity of the originator of undesired e-mail on a per-e-mail basis.
Cooperative Filtering
This type of filtering depends on cooperation between originators and recipients of undesired e-mail. In content labeling-type cooperative filtering, messages may contain additional originator-supplied information, such as the type of information contained in the e-mail message. The recipient may then eliminate undesired e-mail by performing filtering on the originator-supplied information.
In recipient-registration type cooperative filtering, recipients of undesired e-mail can register with senders of such e-mail. The senders may then refrain from sending e-mail to registered recipients.
In both of these types of filtering, the sender of undesired e-mail identifies himself/herself as such; therefore, the identity of the originator of undesired e-mail is known beforehand.
Other Methods
Undesired e-mail may be reduced by enacting prohibitory regulatory laws. Contractual mechanisms may also be used to reduce undesired e-mail. Requiring fees for sending e-mail would also reduce undesired e-mail.
Embodiments of the invention identify undesired e-mail messages by receiving e-mail messages, storing fields including at least one field from the header of each received e-mail message and analyzing the stored fields for at least one pattern indicative of undesired e-mail messages.
In one embodiment, the stored fields are analyzed using pattern recognition that involves counting the number of e-mails received which have the same or similar field content within the headers. This number can be compared to an absolute threshold number, or to the total number of messages in a sample of e-mail messages. The sample may be composed of a predetermined number of received e-mail messages, or may include e-mail messages received during a predetermined time interval. Exceeding thresholds or certain ratios will trigger alarms to alert monitoring functions and update lists of known sources and types of undesired e-mail messages for filtering.