1. Field of the Invention
The present invention generally relates to an electronic messaging system and method of use and, more particularly, to an electronic messaging system that automatically and accurately responds to user queries (e.g., input documents) using a two stage searching and retrieval system.
2. Background Description
Electronic commerce (e.g., the electronic sale of products and customer services relating to those products) over the Internet and more importantly the World Wide Web (WWW) portion of the Internet is becoming increasingly more important to a business"" viability and economic health. However, in order to effectively conduct business over the Internet or any other electronic means it is imperative that the user (e.g., customers) have easy access to the information that is available through the Internet (or other electronic means), and in particular, product and service information contained in a content rich web site sponsored by a particular entity (such as, for example, a company, business, institution and the like). It is equally important that the user have access to the required information via a proprietary database accessed via an intranet, LAN, single use computer, or other similar system.
In order to be an active and competitive participant in the field of electronic commerce, it is critical that the business be responsive to electronic user queries, as well as providing pertinent information in response to those queries, via the content rich web site (or off-line help desk). This provides an avenue for the user to obtain timely information about services and products offered by the business of concern, and further provides a cost efficient method for the entity to disseminate such information to the user.
There are several means of providing pertinent information in response to user queries, such as electronic mail (email), frequently asked question (FAQ) databases and on-line/off-line help desks. These methods permit the user to become more familiar with the company or entity it is doing business with, and thus more comfortable with purchasing products and/or services and/or requesting information over the Internet or other electronic means from the business. This, in turn, may translate into increased traffic on the web site (e.g., the customer again revisiting the web site) or proprietary database (or other electronic system) which is instrumental in increasing revenue growth of the business via its electronic commerce activities. This same interaction also provides a valuable service to the entity of concern by allowing the business or other entity to better serve its customers in a more cost efficient manner.
It is also important to note that such systems may also be instrumental in assisting employees of a company or institution in obtaining internal and/or confidential information that would otherwise be difficult or time consuming to obtain, and which is not accessible to the general public. This information may be accessed via an intranet, LAN or other similar system, and would allow employees to readily obtain information that may be needed in the performance of their employment. By way of example, a customer representative may need to access a help-desk database in order to assist a customer regarding a certain topic.
When using email, the user simply requests certain information and forwards that information via electronic means to the business entity or other concerned party. To this end, businesses, for example, receive and generate many electronic messages in the course of their commerce and activities, which are routed, via a mail system (e.g., server), to a specific individual or individuals, or a general inquiry center. Once the specific individual or individuals receive the message, it is opened, read, and an appropriate action is taken, such as, for example, forwarding the message to another individual, responding to the message or performing countless other actions, and the like. Typically, this is a time consuming and inefficient use of resources and, in many instances, does not adequately address the user""s query in a timely manner.
For example, in large institutions, such as banks, electronic messages are routed to the institution generally, and not to any specific individual. In these instances, several individuals may have the sole function of opening and reading the incoming messages, and to properly route the messages so that, for example, an appropriate action by a qualified specialist can be performed on the message. As can be imagined, this is very time consuming and inefficient, especially when messages need expert attention in several divergent fields.
A more time efficient but less accurate manner of responding to a user""s query is to provide a FAQ database which allows the user to query the database for certain information. In these xe2x80x9cauto-responsexe2x80x9d systems, the user asks a general or specific question and a xe2x80x9cweakxe2x80x9d search engine performs, for example, a keyword matching or nearest neighbor determination, to return a list of potentially relevant documents (responses or answers). However, these searching techniques do not make definite decisions regarding whether a document or answer is relevant to the user query, or present the answers in a manner that is intuitive to the user. Accordingly, the user is typically required to search through a possibly large set of documents in order to find the appropriate answer to the user query. This is especially true when the database of answer documents is large, and such nearest neighbor or other similar known search technologies return a large set of potentially relevant documents or answers.
The FAQ database is a simple and cost efficient means for providing responses to user queries. It is also easy to maintain and update, simply by adding more answers to the database. However, as discussed above, the database becomes less accurate when more answers are placed in the database. Another compromise in the use of FAQ databases is the fact that a large number of responses may be returned, but none of which are responsive to the customer query. This is a result of the xe2x80x9cweakxe2x80x9d search engine used by the FAQ databases. Thus, although there is a low barrier to entry, some users may become frustrated by (i) the many returned responses that must be read prior to obtaining a correct answer or (ii) not receiving a response that is responsive to the user""s query.
An off-line help desk is another way of disseminating information to a user. In this case, the user calls via telephone or requests via email (or other electronic means) certain information which is then routed to an operator. The operator then queries a database of answers in order to appropriately answer the user""s query. This may be performed in the same manner as a FAQ database, to wit a xe2x80x9cweakxe2x80x9d search engine which performs, for example, a keyword matching or nearest neighbor determination. However, the off-line help desk may instead use machine learning techniques which require sample training data. While more accurate than weaker search techniques, current machine learning techniques, alone, suffer from the fact that they are costly to develop and maintain and have a low performance speed.
It is desirable, however, to have an electronic response system that effectively and efficiently responds to a user""s query. This includes providing timely and accurate responses to the user query without the assistance of a qualified specialist or other individual having to read and respond to the incoming message.
In order to effectively and efficiently respond to a user""s query, a two stage messaging system is required. This system would preferably combine a xe2x80x9cweakxe2x80x9d search engine with a machine learning technique in order to respond to the user""s query in an accurately and timely manner. This two stage messaging system would be cost efficient, easy to maintain, and provide a high speed and accurate response system. The general applications would include email systems and any database that may potentially be queried, and would preferably include, at least, classification and categorization of natural language documents and automated electronic data transmission processing and routing.
The present invention is directed to a two stage electronic messaging system and method of use that automatically and accurately responds to user queries (e.g., input documents) using a two stage searching and retrieval system. In order to accomplish the objectives of the present invention, fast document-matching techniques (e.g., xe2x80x9cweakxe2x80x9d search techniques) in combination with more advanced categorization and text-search techniques (e.g., machine learning and other semi-automated techniques) are provided. The two stage searching and retrieval system of the present may be used via the Internet, an intranet, local area network (LAN) or other similar system, and may be used for providing requested information to a user (e.g., customer, employee, customer representative and the like) via a content rich web site, a propriety database or any computer related help system.
More specifically, a user inputs document data which is received by a machine-learning based categorizer. The categorizer first classifies the input document in terms of categories which effectively narrows the possible relevant responses. The categorizer may also assign confidence levels associated with the categories assigned to the input document. By way of example only, the categorizer may analyze the incoming text, which may include tokenization of the text, morphological analysis of the text, or other known text processing techniques in order to establish one or more categories.
Once the specified categories are established, a second search using weaker similarity matching technology (e.g., an example based response generator), then searches the restricted, more focused parts of the entire dataset. The dataset of responses is grouped according to a set of predetermined categories and, optionally, may include confidence levels. The example based response generator may provide simple search techniques, such as, similarity matching techniques, keyword searching or other known searching techniques that do not need to be trained on data.
By using the example based response generator of the present invention, integrating and adding further information to the database without the need for training on data is easily obtained. This added information may, however, later be used to train on so that the more advanced search techniques may utilize this information and provide more accurate category information.
The example based response generator may also provide a xe2x80x9cscorexe2x80x9d or xe2x80x9crankxe2x80x9d associated with the response retrieved from the specified categories. This xe2x80x9cscorexe2x80x9d or xe2x80x9crankxe2x80x9d may assist the user in more easily and accurately finding the most appropriate response to the input document by ranking in importance the response. The categories may also be ranked according to a predetermined ranking scheme.
Once the categories and responses are selected, they may be displayed on a display in accordance with the confidence levels and ranks, in descending or ascending order. Thus, if a category or response does not meet or exceed a threshold level, for example, the category and the response will not be displayed (e.g., if the confidence level of the input document does not meet or exceed the confidence level of the categorized response, then it may not be displayed). Also, in the embodiments of the present invention, the categories and responses may be listed according to the confidence levels and ranks.
Thus, the technique of the present invention increases the odds of finding correct and responsive answers to the user""s query (or input document).