1. Field of the Invention
The present invention relates generally to electronic mail (e-mail) systems and, more particularly, to improved methodology for filtering e-mail messages sent from various host domains.
2. Description of the Background Art
Today, electronic mail or “e-mail” is a pervasive, if not the most predominant, form of electronic communication. FIG. 1A illustrates the basic architecture of a typical electronic mail system 10. At a high level, the system includes a mail server connected over a network to various e-mail “clients,” that is, the individual users of the system. More specifically, the system 10 includes one or more clients 11 connected over a network to at least one SMTP (Simple Mail Transfer Protocol) server or “Mail Transport Agent” (MTA) 12a for routing e-mail. Users write, send, and read e-mail via Mail User Agents (MUA), such as Microsoft Outlook™, present at each client (computer). To send e-mail, an MUA connects to an MTA which receives the e-mail and routes it to another MTA. An intermediary MTA might forward the e-mail to yet another MTA until the e-mail reaches the destination system, where the e-mail is stored in a mailbox accessible by the recipient.
A typical e-mail delivery process is as follows. In the following scenario, Larry sends e-mail to Martha at her e-mail address: martha@example.org. Martha's Internet Service Provider (ISP) uses an MTA, such as provided by Sendmail® for NT, available from Sendmail, Inc. of Emeryville, Calif. (With a lower case “s,” “sendmail” refers to Sendmail's MTA, which is one component of the Sendmail® Switch product line.)    1. Larry composes the message and chooses Send in Microsoft Outlook Express (a “Mail User Agent” or MUA). The e-mail message itself specifies one or more intended recipients (i.e., destination e-mail addresses), a subject heading, and a message body; optionally, the message may specify accompanying attachments.    2. Microsoft Outlook Express queries a DNS server for the IP address of the local mail server running sendmail. The DNS server translates the domain name into an IP address, e.g., 10.1.1.1, of the local mail server.    3. Microsoft Outlook Express opens a TCP/IP connection to the local mail server running sendmail. The message is transmitted to a second sendmail server using the SMTP protocol (e.g., as defined in RFC 821).    4. sendmail queries a DNS server for the MX record of the destination domain, i.e., example.org. The DNS server returns a hostname (e.g., mail.example.org) for the destination domain. sendmail queries a DNS server for the A record of mail.example.org (i.e., the IP address). The DNS server returns an IP address of, for example, 127.118.10.3.    5. sendmail opens a TCP/IP connection to the remote mail server providing e-mail service for example.org, which is also running sendmail. The message is transmitted to the remote sendmail mail server using the SMTP protocol.    6. sendmail delivers Larry's message for Martha to the local delivery agent. It appends the message to Martha's mailbox. By default, the message is stored in (e.g., using a sample file path on a UNIX system):            /var/spool/mail/martha.            7. Martha has her computer dial into her ISP.    8. Martha chooses “Check Mail” in Eudora.    9. Eudora opens a POP3 (Post Office Protocol version 3, defined in RFC 1725) connection with the POP3 (incoming mail) server. Eudora downloads Martha's new messages, including the message from Larry.    10. Martha reads Larry's message.
The MTA, which is responsible for queuing up messages and arranging for their distribution, is the workhorse component of electronic mail systems. The MTA “listens” for incoming e-mail messages on the SMTP port, which is generally port 25. When an e-mail message is detected, it handles the message according to configuration settings, that is, the settings chosen by the system administrator, in accordance with relevant standards such as Request For Comment documents (RFCs). Typically, the mail server or MTA must temporarily store incoming and outgoing messages in a queue, the “mail queue.” Actual queue size is highly dependent on one's system resources and daily volumes.
MTAs, such as the commercially-available Sendmail® MTA, perform three key mail transport functions:                1. Route mail across the Internet to an MTA serving a different network or “domain” (since many domains can and do exist in a single network);        2. Relay mail to another MTA (e.g., MTA 12b on FIG. 1A) on a different subnet within the same network;        3. Transfer mail from one host or server to another on the same network subnet.To perform these functions, an MTA accepts messages from other MTAs or MUAs, parses addresses to identify recipients and domains, resolves aliases, fixes addressing problems, copies mail into a queue on its hard disk, tries to process long and hard-to-pass messages, and notifies the sender when a particular task cannot be successfully completed. The MTA does not store messages (apart from its queue) or help users access messages. It relies on other mail system components, such as message delivery agents, message stores and mail user agents (MUAs), to perform these tasks. These additional components can belong to any number of commercial or free products (e.g., POP3 or IMAP servers, Microsoft Exchange, IBM Lotus Notes, Netscape, cc:Mail servers, or the like). Because of its central role in e-mail systems, however, the MTA often serves as the “glue” that makes everything appear to work together seamlessly.        
The overall process may be summarized as follows. E-mail is routed via SMTP servers, the so-called “Mail Transport Agents” (MTA). Users write, send, and read e-mail via Mail User Agents (MUA). To send e-mail, an MUA connects to an MTA, which receives the e-mail and routes it to another MTA. An intermediary MTA might forward the e-mail to yet another MTA until the e-mail reaches the destination system, where the e-mail is stored in a mailbox accessible by the recipient.
For further description of e-mail systems, see e.g., Sendmail® for NT User Guide, Part Number DOC-SMN-300-WNT-MAN-0999, available from Sendmail, Inc. of Emeryville, Calif., the disclosure of which is hereby incorporated by reference. Further description of the basic architecture and operation of e-mail systems is available in the technical and trade literature, see e.g., the following RFC (Request For Comments) documents:
RFC 821Simple Mail Transfer Protocol (SMTP)RFC 822Standard for the Format of ARPA Internet TextMessagesRFC 974Mail Routing and the Domain SystemRFC 937, RFC 1081Post Office Protocol version 3 (POP3)RFC 1123Requirements for Internet Hosts -- Applicationand SupportRFC 1725Post Office Protocol version 3 (POP3)RFC 2033Local Mail Transfer Protocol (LMTP)RFC 2060, RFC 2061Internet Message Access Protocol (IMAP)RFC 2246The TLS Protocol, version 1.0RFC 2487SMTP Service Extension for Secure SMTP overTLS
RFCs are numbered Internet informational documents and standards widely followed by commercial software and freeware in the Internet and UNIX communities. The RFCs are unusual in that they are floated by technical experts acting on their own initiative and reviewed by the Internet at large, rather than formally promulgated through an institution such as ANSI. For this reason, they remain known as RFCs even once they are adopted as standards. The above-listed RFC documents are currently available via the Internet (e.g., at www.ietf.org/rfc), the disclosures of which are hereby incorporated by reference.
During operation of the Sendmail MTA, a listening process operates to detect requests for new connections. When a request for a new connection arrives, Sendmail makes a new copy or instance of itself through a “forking” operation. The “forked” or new process deals with the new connection exclusively. Thus, at the conclusion of the forking operation two processes exist: a listening process to detect requests for new connections, and a forked process which was created for the purpose of exclusively handling a particular new connection. This forking operation may be repeated to spawn other child processes, each one for exclusively handling a particular new connection.
Each child process that is created has no knowledge of the other child processes (i.e., no memory access to the data structures of the other child processes). This lack of knowledge of other child processes leads to system vulnerability. For example, the child processes cannot detect that the system is being “slammed” by many connections from a particular host. In this scenario, only the parent process (i.e., the process spawning the child processes) would be able to know (i.e., be able to maintain information) about the source of all of the connections to the system. Once a forking operation occurs, there is no interprocess communication between the child processes that would allow these processes to detect the foregoing slamming scenario.
FIG. 1B illustrates this problem in further detail. In the figure, Parent process 101 contains a variable A having an initial value of 1 (i.e., at Time t0). Suppose, at Time T1, a child process, Child 1 (shown at 105), is created using the forking operation. At the instance that Child 1 is created, the entire memory space of the Parent is copied to Child 1. Thus, at Time T1, Child 1 also contains a variable A having a value of 1, as shown. Suppose, at Time T2, that the Parent process 101 changes the value of the variable A to 2. The value of the variable A in Child 1 (105) is not changed, as the two processes are now separate. Proceeding with the example, at Time T3, Child 2 (shown at 107) is created, again using the forking operation. As before, the newly created Child process receives a copy of the then-current copy of the memory space of the Parent. Thus, Child 2 (107) contains a variable A having a value of 2, as shown. Note in particular that the first Child process, Child 1, has no knowledge that the current value of A (whatever significance it may have) in the parent is now 2. The second Child process, Child 2, on the other hand, does know that the current value of A in the parent is 2, by virtue of the fact that Child 2 was created at a later point in time (i.e., Time T3).
Without the establishment of interprocess communication (IPC) between the processes (e.g., shared memory, UNIX-style pipes, or the like), there is no knowledge shared among the processes about the current state of each process' data. In the context of an e-mail system, if there are several connections coming from a single host (for which the Parent process has created several Child processes), there is no shared knowledge among the processes to indicate which process is handling what connection. Thus, the system cannot readily determine that, for example, it may be handling ten incoming connections for a single domain (e.g., AOL.com). At the same time, however, the system may in fact benefit from identifying that scenario so that the system can moderate usage of its resources by various domains. For instance, if a particular domain is hogging the resources of the system (e.g., “slamming” the system with a multitude of e-mail messages), the system would want to identify that situation and take corrective action.
To date, efforts at addressing the foregoing problem have not provided a domain-specific solution and, therefore, have been sub-optimal. For example, existing e-mail systems may be configured to limit the number of child processes created at a given instance in time. That approach, however, simply provides a general limit on system resources. The approach does not address over-utilization or abuse of system resources by a particular domain.
Given the ever-increasing reliance on e-mail as a preferred medium for business and personal communication, there is much interest in improving the performance and reliability of e-mail systems. Accordingly, there is a need for an e-mail system that incorporates methodology for moderating usage of its resources on a per-domain basis. The present invention fulfills this and other needs.