Electronic mail has become one of the most widely used business productivity applications. However, people increasingly feel frustrated by their electronic mail. They are overwhelmed by the volume, lose important items, and feel pressure to respond quickly. Though electronic mail usage has changed, electronic mail clients have changed little since they were first invented. Although today's electronic mail clients are more graphical with onscreen buttons, pull-down menus and rich-text display, they are essentially derivative programs of the electronic mail clients from thirty years ago. Most electronic mail clients today have the same set of features and organizational structures: multiple folders in which messages can be filed, a textual listing of the messages within a given folder, and the ability to preview a selected message. However, studies have shown that folder systems quickly degrade with the number of messages people receive. Most people end up keeping all of their electronic mail in one large folder. The content and use of electronic mail has also changed. In addition to traditional letters, electronic mail now consists of invitations, receipts, transactions, discussions, conversations, tasks, and newsletters, to name a few variations.
Information overload motivates the need for automatic document summarization programs. The incentive, from a corporate standpoint, is that users need the ability to decide quickly which threads to examine, and which entries in a thread might be interesting.
Electronic mail threads are groups of replies that, directly or indirectly, are responses to an initial electronic mail message. While many utilities and theories have been developed to address the problem of summarizing single documents, little known work has been done specifically with regard to electronic mail thread summarization. Electronic mail messages, unlike archival documents, are often short, informal, and not well-formed. When commercially-available document summarization programs are used on electronic mail, the resulting summaries lack context, instead often containing electronic mail signatures or header fields mentioned in electronic mail messages. The summary results for a single electronic mail message become more relevant when additional context, represented by the electronic mail thread enclosing the message, is used. Electronic mail threads provide valuable context for summarizing electronic mail messages, and allow document summarization programs to exploit the structure of electronic mail not found in other documents.
International Business Machines Corporation has published an algorithm for summarizing discussion databases, such as Usenet newsgroups or Notes discussion groups. However, application of such an algorithm to the task of summarizing electronic mail threads presents difficulties, as electronic mail threads differ from discussion databases in a number of ways. For example, discussion databases archive all of the content of discussion groups. As a result, discussion group summarizers never have to deal with deleted documents when analyzing threads. Second, discussion groups do not have to address a thread discovery problem because they have a true parent-child hierarchy. Third, electronic mail contains additional structure, which discussion group summarizers do not exploit.
Accordingly a need exists for a way to summarize electronic mail in a manner that produces meaningful results.
A further need exists for a way to summarize complete electronic mail threads so that such summaries may be presented in a useful manner to a user.