This invention pertains to the arts of automatic analysis, classification, characterization and routing of text-based messages in electronic messaging systems. The text message filtering and modeling system and method disclosed is especially suitable for use in analyzing, classifying, routing, and directing large volumes of electronic messages with a wide variety of content, authorship, and intent directed generally at a single, large recipient such as a corporation.
Not applicable.
This invention was not developed in conjunction with any Federally-sponsored contract.
Not applicable.
Electronic mail and facsimile (xe2x80x9cfaxxe2x80x9d) messaging have become critical tools of everyday personal and business life. Most corporations, government agencies, organizations, and institutions have established fax numbers and e-mail addresses for a wide variety of contact purposes, including requesting information such as literature and office locations from the entity, requesting investment information, requesting service on or technical support for a product, reporting a product problem or failure, submitting suggestions for products and service improvements, submitting complimentary comments, and in some cases, carrying on dialogues with personalities and celebrities associated with the entity. Fax and e-mail messaging have converged in electronic form, as messages originating in the form of fax are commonly captured by computers with fax/modem interfaces and optically converted to text files, and as many services offer low cost fax message delivery via e-mail-based interfaces.
Underlying the tremendous proliferation of fax and e-mail are several factors, including wide-spread availability of inexpensive e-mail clients such as personal computers, and inexpensive fax machines, and the development of common standards for exchange of electronic text messages between computers, including RFC821 Simple Mail Transfer Protocol (xe2x80x9cSMTPxe2x80x9d) from the Internet Network Information Center, and Recommendation X.400 from the International Telecommunications Union (xe2x80x9cITUxe2x80x9d).
Consequently, corporations, government agencies, and other entities which successfully promote the availability of their fax telephone numbers and e-mail addresses can receive thousands to tens-of-thousands of messages per day. Traditionally, all of the messages are received in a general repository, or xe2x80x9cmailboxxe2x80x9d, and reviewed by human agents for their content, intent, and determination of the correct disposition of the e-mail is made. This may involve sending the author a standard reply, and/or copying or fowarding the e-mail to one or more divisions, departments, or individuals within the organization for further handling. In the later case where multiple parties must be consulted, the consolidation of replies from all of the parties can be cumbersome and overwhelming, given the volume of messages to be handled. For example, assuming a company receives five thousand messages per day, and if each one of those messages contains issues or requests that involve an average of 3 departments or individuals to respond, the original message must be read once by the reviewing agent and the receiving departments may read the forwarded message one to three times per department before it reaches the person who can respond. Under such circumstances, 5,000 received e-mails may result in up to 20,000 to 50,000 reviews of those messages in the company. In many cases, the final recipient may need to instigate a short dialogue including several message exchanges with the author in order to ascertain exactly what the author needs or how the author can be serviced. So, a daily volume of 5,000 new messages quickly accumulates to a total network volume and work load of tens-of-thousands to even a hundred-thousand messages per day.
Analogous situations exist in the telephone call center and paper mail paradigms. For example, a single toll-free telephone number may be used for customer orders, information requests, service reports, etc. In this paradigm, systems for handling large call volumes, known as Automated Call Distributors, have been developed to sort and route telephone calls to human agents. Systems known as xe2x80x9cInteractive Voice Responsexe2x80x9d have been developed to allow many of the calls to be handled entirely automatically by providing bank-by-phone, tele-reservations, and other well-known telephone-based services. In the paper mail paradigm, automated sorting and routing systems have been developed using barcode markings and optical recognition of handwriting.
The following publications and standards provide additional information into the background of the arts of e-mail routing, natural language processing, and pattern recognition:
1. Internet Network Information Center (xe2x80x9cInterNICxe2x80x9d) Request for Comment 821, xe2x80x9cSimple Mail Transfer Protocolxe2x80x9d (SMTP), Filename RFC821.TXT from http://www.internic.net.
2. International Telecommunciations Union (xe2x80x9cITUxe2x80x9d) Recommendation X.400, available from the ITU, Berne, Switzerland, and from the ITU""s website at www.itu.org.
3. xe2x80x9cFuzzy and Neural Approaches in Engineeringxe2x80x9d by Lofteri H. Tsoukalas and Robert E. Uhrig, published by John Wiley and Sons, Inc., copyright 1997, ISBN number 0-47116-003-2.
4. xe2x80x9cPattern Recognition and Image Analysisxe2x80x9d by Earl Gose, Richard Johnsonbaugh, and Steve Jost, published by Prentice Hall, copyright 1996, ISBN number 0-13-23645-8.
5. xe2x80x9cNatural Language and Exploration of an Information Space: The ALFresco Interactive Systemxe2x80x9d, a white paper by Olivero Stock, appearing starting on page 421 of the book xe2x80x9cReadings in Intelligent User Interfacesxe2x80x9d, edited by Mark T. Maybury and Wolfgang Wahlster, published by Morgan Kaufman Publishers, Inc., copyright 1998, ISBN number 1-55860-444-8.
6. U.S. Pat. No. 5,768,505 to Gilchrist, et al.
7. U.S. Pat. No. 5,859,636 to Pandit.
In the electronic messaging arts, United States patents have been issued for systems which route messages based on well-defined codes stored within the message, including the recipient""s network address and a copy list of network addresses. There exist methodologies that are well-known which individually yield useful information and characterizations of written messages, including use of neural network, fuzzy logic, and statistical analysis techniques. However, there is an absence in the art of automatic systems which perform intelligent routing of messages which are addressed to a multipurpose network address, which employ these analysis and characterization techniques coupled with message routing technology.
Therefore, there exists a need in the art for an automated system and method to review large volumes of text messages for their content, intent, need, and purpose in order to expedite the time-to-response to the messages.
Further, there exists a need in the art for this automated system to use conventional technology and techniques which find practical application to the analysis of natural language written speech.
Additionally, there exists a need in the art for this system to provide for initial training of the rules and thresholds used in the analysis of the messages, and for the training, or xe2x80x9cleaningxe2x80x9d, of the algorithms to continue over time based on user input and changes to initial analysis conclusions, such that the utility of the system grows as it learns how to filter messages.
Still further, there exists a need for this system to be implemented using an architecture which allows the addition, removal, and upgrade of the methodologies in order to tune the system to particular applications of the system and to update the system""s performance as new technologies become available.