1. Field of the Invention
The present invention relates generally to electronic mail (e-mail) systems and, more particularly, to improved methodology for processing automated e-mail messages sent to numerous recipients.
2. Description of the Background Art
Today, electronic mail or “e-mail” is a pervasive, if not the most predominant, form of electronic communication. FIG. 1 illustrates the basic architecture of a typical electronic mail system 10. At a high level, the system includes a mail server connected over a network to various e-mail “clients,” that is, the individual users of the system. More specifically, the system 10 includes one or more clients 11 connected over a network to at least one SMTP (Simple Mail Transport Protocol) server or “Message Transfer Agent” (MTA) 12a for routing e-mail. Users write, send, and read e-mail via Mail User Agents (MUA), such as Microsoft Outlook™, present at each client (computer). To send e-mail, an MUA connects to an MTA which receives the e-mail and routes it to another MTA. An intermediary MTA might forward the e-mail to yet another MTA until the e-mail reaches the destination system, where the e-mail is stored in a mailbox accessible by the recipient.
A typical e-mail delivery process is as follows. In the following scenario, Larry sends e-mail to Martha at her e-mail address: martha@example.org. Martha's Internet Service Provider (ISP) uses an MTA, such as provided by Sendmail® for NT, available from Sendmail, Inc. of Emeryville, Calif. (With a lower case “s,” “sendmail” refers to Sendmail's MTA, which is one component of the Sendmail® Switch product line.)    1. Larry composes the message and chooses Send in Microsoft Outlook Express (a “Mail User Agent” or MUA). The e-mail message itself specifies one or more intended recipients (i.e., destination e-mail addresses), a subject heading, and a message body; optionally, the message may specify accompanying attachments.    2. Microsoft Outlook Express queries a DNS server for the IP address of the local mail server running sendmail. The DNS server translates the domain name into an IP address, e.g., 10.1.1.1, of the local mail server.    3. Microsoft Outlook Express opens an SMTP connection to the local mail server running sendmail. The message is transmitted to the second sendmail server using the SMTP protocol.    4. sendmail queries a DNS server for the MX record of the destination domain, i.e., example.org. The DNS server returns a hostname, e.g., mail.example.org. sendmail queries a DNS server for the A record of mail.example.org, i.e., the IP address. The DNS server returns an IP address of, for example, 127.118.10.3.    5. sendmail opens an SMTP connection to the remote mail server providing e-mail service for example.org which is also running sendmail. The message is transmitted to the sendmail server using the SMTP protocol.    6. sendmail delivers Larry's message for Martha to the local delivery agent. It appends the message to Martha's mailbox. By default, the message is stored in (e.g., using a sample file path on a UNIX system):            /var/spool/mail/martha.            7. Martha has her computer dial into her ISP.    8. Martha chooses “Check Mail” in Eudora.    9. Eudora opens a POP3 (Post Office Protocol version 3, defined in RFC1725) connection with the POP3 (incoming mail) server. Eudora downloads Martha's new messages, including the message from Larry.    10. Martha reads Larry's message.
The MTA, which is responsible for queuing up messages and arranging for their distribution, is the workhorse component of electronic mail systems. The MTA “listens” for incoming e-mail messages on the SMTP port, which is generally port 25. When an e-mail message is detected, it handles the message according to configuration settings, that is, the settings chosen by the system administrator, in accordance with relevant standards such as Request For Comment documents (RFCs). Typically, the mail server or MTA must temporarily store incoming and outgoing messages in a queue, the “mail queue.” Actual queue size is highly dependent on one's system resources and daily volumes.
MTAs, such as the commercially-available Sendmail® MTA, perform three key mail transport functions:                1. Route mail across the Internet to an MTA serving a different network or “domain” (since many domains can and do exist in a single network);        2. Relay mail to another MTA (e.g., 12b) on a different subnet within the same network;        3. Transfer mail from one host or server to another on the same network subnet.To perform these functions, an MTA accepts messages from other MTAs or MUAs, parses addresses to identify recipients and domains, resolves aliases, fixes addressing problems, copies mail into a queue on its hard disk, tries to process long and hard-to-pass messages, and notifies the sender when a particular task cannot be successfully completed. The MTA does not store messages (apart from its queue) or help users access messages. It relies on other mail system components, such as message delivery agents, message stores and mail user agents (MUAs), to perform these tasks. These additional components can belong to any number of commercial or free products (e.g., POP3 or IMAP servers, Microsoft Exchange, IBM Lotus Notes, Netscape, cc:Mail servers, or the like). Because of its central role in the e-mail systems, however, the MTA often serves as the “glue” that makes everything appear to work together seamlessly.        
The overall process may be summarized as follows. E-mail is routed via SMTP servers, the so-called “Mail Transfer Agents” (MTA). Users write, send, and read e-mail via Mail User Agents (MUA). To send e-mail, an MUA connects to an MTA which receives the e-mail and routes it to another MTA. An intermediary MTA might forward the e-mail to yet another MTA until the e-mail reaches the destination system, where the e-mail is stored in a mailbox accessible by the recipient.
For further description of e-mail systems, see e.g., Sendmail® for NT User Guide, Part Number DOC-SMN-300-WNT-MAN-0999, available from Sendmail, Inc. of Emeryville, Calif., the disclosure of which is hereby incorporated by reference. Further description of the basic architecture and operation of e-mail systems is available in the technical and trade literature; see e.g., the following RFC (Request For Comments) documents:
RFC821Simple Mail Transfer Protocol (SMTP)RFC822Standard for the Format of ARPA InternetText MessagesRFC974Mail Routing and the Domain SystemRFC937, RFC1081Post Office Protocol version 3 (POP3)RFC1123Requirements for Internet Hosts-Applicationand SupportRFC1725Post Office Protocol version 3 (POP3)RFC2033Local Mail Transfer Protocol (LMTP)RFC2060, RFC2061Internet Message Access Protocol (IMAP)RFC2246The TLS Protocol, version 1.0RFC2487SMTP Service Extension for Secure SMTP overTLSRFCs are numbered Internet informational documents and standards widely followed by commercial software and freeware in the Internet and UNIX communities. The RFCs are unusual in that they are floated by technical experts acting on their own initiative and reviewed by the Internet at large, rather than formally promulgated through an institution such as ANSI. For this reason, they remain known as RFCs even once they are adopted as standards. The above-listed RFC documents are currently available via the Internet (e.g., at http://www.ietf.org/rfc), the disclosures of which are hereby incorporated by reference.
Often when sending e-mail, a distribution or “mailing list” is employed to facilitate the process of sending an e-mail message to a group of people. For instance, instead of addressing an e-mail message to individual members of a recurring group, a user can instead simply define a mailing list to comprise those members. For example, the user could define a “Marketing” mailing list that specifies members of the marketing department of the user's company. Once defined, the mailing list can be used in the recipient field for an e-mail message, in lieu of listing individual members. A message sent to this distribution list goes to all recipients listed. Typically, e-mail systems provide graphical user interface facilities for managing (e.g., adding and deleting) names in a mailing list.
Expectedly, as a particular list grows larger, it becomes progressively more resource intensive and time consuming to manage and process. Although the foregoing example of a mailing list for a marketing department may comprise a comparatively small group of recipients (e.g., less than 100), a mailing list can in fact specify an extremely large group of recipients. Consider, for instance, a mailing list defined for customer support (e.g., “North American Users”) for a large software company. As another example, ISPs (Internet Service Providers) typically support many domains, many lists within each domain, and many users for each list. In such a case, a given mailing list may in fact specify many thousands or even millions of recipients, leading to an incredible amount of mailing list traffic. Accordingly, there is great interest in improving the management and processing of mailing lists so that e-mail sent to mailing lists, particularly large ones, are processed in an efficient manner.
In an electronic mail system, the task of processing a mailing list usually falls to a Mailing List Manager or “MLM”, such as MLM 13 for the e-mail system for FIG. 1. Upon receiving an e-mail message sent to a predefined mailing list, the system's MTA hands off the message, with the name of the list, to the system's MLM. After checking the message, the MLM enumerates the individual recipients for the list and hands the message with a list of the specific intended recipients (i.e., with the names/e-mail addresses of the specific intended recipients attached) back to the MTA for redistribution. For instance, if the message had a mailing list specifying 100 recipients, the MLM would, after finishing its work, post the message back to the MTA with each of the 100 recipients specified. Here, the MLM opens a connection (e.g., “pipe” in UNIX—a direct data feed) to the MTA. The MTA is responsible for queuing up the message, arranging for its distribution to all of the various recipients, and retrying failed deliveries.
Without further enhancement to this basic process of handling an e-mail message with a large mailing list, the MLM is handing a substantial amount of work to the MTA to do, with no real intelligence. For instance, for a message sent to a predefined mailing list of 1000 recipients, the MLM is handing to the MTA a list of 1000 tasks to do in sequence—that is, 1000 messages to queue and distribute. At the same time, MTAs tend not to be very good at parallel delivery of a single message. Therefore, the approach commonly employed by MTAs is to do the tasks in series, one at a time. However, that approach incurs the penalty of increased delivery time due to network latency and/or system load.
Apart for the above one-to-many problem, an analogous problem concerns an e-mail that needs to go to a very large number of people where the e-mail's content or body is not constant but, instead, is customized for a given recipient. In such a case, one has millions of people who are intended recipients of messages that vary in content (i.e., message body)—that is, a scenario presenting a multitude of one-to-one relationships.
Present-day mass-mailing advertisers face such a problem. Doubleclick, for example, employs a “Composer” program to create customized mass e-mailings (i.e., electronic mailings). The Composer's basic operation is simple. The Composer works against a large list or database of people. Each person, in turn, has signed up to receive one or more specific topics (e.g., about travel, about business, about finance, or the like) in a regular electronic mailing or newsletter. Thus, in this large database of people, everybody has different combinations of what specific information he or she really wants. Based on this user-specific information, the Composer program will compose a customized piece of e-mail for each particular user, inserting the specific pieces of information the user has requested into the e-mail message's body and possibly even using the user's real name. After the appropriate message is composed for a given target user, the Composer directs an accompanying e-mail system to send that message to the target user. The Composer program repeats this basic operation for all individuals in its database.
For a given user, the foregoing process is relatively fast. However, a mass-mailing database may contain many millions of names. When one is faced with the task of creating customized mass e-mailings for millions of users, the approach of doing one user at a time is rather inefficient. Worse, with the standard systems that are being used today, when a system sends a message, the system waits until that message is accepted, by either the final mail server for delivery, or by an intermediate mail server enroute, before the Composer can proceed to the next address. Given the massive scale in which the operation is occurring, there is of course much interest in optimizing the process.
One approach to this problem is to attempt to run the Composer with some amount of parallelism. Here, the Composer is run in such a way that the list of recipients is broken down into smaller groups for parallel processing. For example, a group of one million people may be divided into ten groups of 100,000, each group being processed in parallel (e.g., by ten Composers running in parallel). The improvements with this approach, however, are inadequate. Reasons include excessive disk I/O (input/output) and excessive e-mail queue waiting times. The significant improvements in scalability and throughput simply are not realized with such an approach.
What is needed is an e-mail system that implements parallel processing for mass mailings, with as much resource sharing and re-use, and as little disk I/O, as possible. More particularly, it is desirable to take advantage of today's multithreaded computer systems to send e-mail on one processing thread while another processing thread is waiting for either input or a reply, including streamlining the process so that there is as little waiting as possible.