Dealing with a large volume of e-mail is recognized as a ubiquitous knowledge-worker problem. Not only does e-mail quickly accumulate in inboxes and other folders, with many contained threads left unread for long periods, but also people frequently need to become acquainted with the deliberations recorded in a high-volume public or private discussion. Numerous approaches have attempted to deal with the problem to some extent.
Conventional mailers and on-line archives list messages sorted by subject and date. This approach allows a user to focus on a single subject at a time, but requires that messages be viewed one at a time, in a fragmented way. Also, some mailers, such as Microsoft® Outlook®, may optionally supply the first few lines of a message in a folder listing. However, this system uses any material, including quoted passages, to produce these lines. Thus, redundant information is viewed rather than the new subject matter of the particular e-mail.
In another approach to dealing with the volume problem, some conventional mailing list managers permit digested subscriptions. Examples of such mailing lists managers include ListProc, LISTSERV Lite, MajorDomo and SmartList. Such managers allow users to elect to receive collections of messages within a single external message, often once per day, to reduce the frequency of messages received from the associated list and to reduce reading fragmentation. The digested subscriptions permit more efficient reading by combining submissions into a single message. Reading a collection of related messages in a single document can lessen the cognitive burden on a user to recall the context surrounding an individual message. However, automatic digests may only capture small parts of a conversation and also may include more than one conversation. So, reading a single thread requires inspection of a number of digests, and reading material from a single thread within a digest is often interrupted by material from other threads. Also, while digests may omit some irrelevant parts of message headers, they do not deal with other types of redundant or irrelevant material whose presence inhibits efficient reading. Examples of unnecessary information include an entire earlier mail message (or message chain) for reference, long quotes from one or more earlier messages, signature boxes, aphorisms and the like. When an individual message is viewed without previous messages available, extensive contextual information may be necessary for comprehension, but when a message appears in a digest, or is read in its place in a threaded sequence, the contextual information may seem superfluous and may also interfere with the reading sequence because readers must devote time to dealing with the redundant information. The readers must at least skim past this redundant information to look for new material.
Removing extraneous material requires analyzing the content of the message to some extent. One approach to message analysis, for a different purpose, is described by R. Sproat and H. Chen in, EMU: An Email Preprocessor for Text to Speech, IEEE Signal Processing Society Workshop on Multi Media Signal Processing, Los Angeles, 1998. This paper describes a combination of finite state machines. The first finite state machine assigns a set of weights to each line, one for each of eight fixed, relatively coarse, line classes. This automaton operates on the lines encoded into sequences of character classes (upper and lower case letters, digits, different kinds of punctuation) and is trained on tagged lines. The resulting network is then combined with another automaton which imposes additional restrictions, such as requiring that all lines in a blank-line-separated block be of the same type. This second automaton operates only to constrain the results of the first one. The resulting, relatively coarse analysis is suitable to a vehicle designed for a text-to-speech application, in which all the material is to be read. Therefore, a detailed line-type analyses based on a full message grammar that is intended to isolate material which may be omitted or elided (e.g., quoted passage introductions, message closings, aphorisms or the like) and some material which must be differentially formatted (e.g. program code, which frequently appears in software-related discussions) is not attempted in this approach. It is, however, used in conjunction with a further approach that is needed to allow reading of message endmatter, which may be two-dimensional.
H. Chen and R. Sproat, describe the further analysis in a paper entitled Integrating Geometrical and Linguistic Analysis for Email Signature Block Parsing, ACM Transactions on Information Systems, Volume 17, No. 4, October 1999. The more detailed analysis reanalyzes the end parts of messages that were processed by the automatons described in the previous paper. The analysis combines geometric analysis to detect vertical sections of blocks and another weighted finite state machine to analyze and verify alternative vertical section decompositions using detailed linguistic criteria.
A paper entitled Cut as a querying unit for WWW, Netnews, email by T. Keishi, Y. Mizuuchi et al., in Proceedings of the Ninth ACM Conference on Hypertext and Hypermedia Links, Objects, Time and Space Structure in Hypermedia Systems, 1988, p. 235, discloses a specification of a method for detecting quotes and for using these quotes in threading e-mails.
U.S. Pat. No. 5,905,863 discloses a method for finding a best single message predecessor in a thread, using quoted vs. non-quoted text comparisons and also using statistically-based message text comparisons.
A paper entitled Automatic animation of discussions in USENET by J. Yabe, S. Takahashi and E. Shibayama in Proceedings of AVI 2000, Palermo, provides a discussion of linear sequencing of message segments such that elements of messages responding to a passage are arranged near that passage.
All documents cited herein, including the foregoing, are incorporated herein by reference in their entireties.