The growth in software applications and computer hardware components, not only in terms of volume, but also in terms of complexity and diversity, continues to rapidly accelerate. At the same time, the number of computer users is quickly increasing. The ability of vendors to provide support and assistance to computer users is a matter of significant concern. Current user support typically consists of a user calling a helpdesk for human assistance. Unfortunately, the current process of primarily relying upon human intervention for providing user assistance is not meeting the growing demand for support.
Oftentimes, a computer user is at the client end of a network, which could be a local area network, intranet, or the Internet. The computer user at the client site may encounter many different text messages produced by a wide variety of events, many of which require support. Error messages and support requests are common examples of event messages occurring at the client machine that require support. In this scenario, where the user is remote from the physical location of the human assistant, the information available to the user and, thus, the human assistant, is very limited, consisting of the text message associated with the event.
Presently, in a client/server setting, event messages are stored as text strings on the client. In the vast majority of cases, these text strings lack a unique identifier. As a result, computer programs are unable to determine the source of the event that may require support. For instance, the error message “insufficient memory” could stem from many different sources. Generally, event messages are now being handled in an inflexible rule-based manner, in which every character in each message must be exactly known, accounted for and stored in a file along with the appropriate action to be taken for that particular message.
Accordingly, there exists a need to improve user support and assistance, while at the same time reducing the need for human intervention when providing user support and assistance. The current processes for handling event messages that require support have many flaws. Most notably, the current processes often fail to provide a user or a remote human assistant with a sufficient amount of diagnostic information, i.e., an amount of diagnostic information adequate to quickly identify and resolve a problem. The current processes also require labor-intensive development and testing. Additionally, the current processes use large quantities of memory and other resources at the client site. The present invention is directed to fulfilling this need.
As will be better understood from the following description, the present invention employs a classifier for classifying textual informational objects. While there exist many different ways to classify text, the present invention uses a support vector machine, a known text classifier, to classify textual information. Brief descriptions of text classifiers in general and of text classification using support vector machines specifically, are provided below. For a more detailed description of support vector machines, attention is directed to U.S. patent application Ser. No. 09/102,946, filed Jun. 23, 1998, entitled “Methods and Apparatus For Classifying Text and For Building A Text Classifier”, by inventors Susan T. Dumais, John C. Platt, David E. Heckerman, Mehran Sahami, and Eric J. Horvitz, and commonly assigned.
One way textual informational objects can be classified is manually, by trained professionals. However, manual text classification is very time consuming and costly. Therefore, this approach is often impractical. Consequently, ways to automate text classification have been developed. In some cases, rule-based approaches are used when objects must be classified with absolute certainty. However, rule-based methods also are limited due to the fact that they generally require manual construction of the rules, make rigid binary decisions about category membership, and are typically difficult to modify.
Another strategy is to use inductive learning techniques to automatically construct classifiers. Inductively learned classifiers are trained using labeled training data, consisting of examples of items that are in each category, and also may include examples of data specifically not in a given category. Weights are assigned to terms or features of an item to represent the importance or relevance of that term to a category. The weights can be adjusted during training until the classifier performs optimally. A separate classifier is trained or learned for each category. All classifiers output a graded measure of category membership, so different thresholds can be set to favor precision or recall depending on the application. New items are classified by computing a score and comparing the score with a learned threshold. New items exceeding the threshold are considered as belonging to the category.
The resulting learned text classifiers have many advantages. For instance, inductively learned text classifiers are easy to construct and update, since they depend only on information that is easy for people to provide, namely, examples of items that are in or out of categories. Inductively learned text classifiers can also be customized to specific categories of interest to individuals to allow users to easily trade off precision and recall depending on their task. Inductively learned classifiers are easy to construct and update since they require only subject knowledge and not programming or rule-writing skills. Several inductively learned classifiers are presently known to those skilled in the art, such as neural networks, Bayesian networks, and support vector machines.
Support vector machines have been found to be more accurate at text classification than Bayesian networks. (Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features”, LS-8 Report 23, University of Dortmund Computer Science Department (November 1997).) Although support vector machines are known to those skilled in the art, a brief description of the general idea behind support vector machines follows.
Generally, an object to be classified by support vector machines may be represented by a number of features. If, for example, the object to be classified is represented by two features, it may be represented by a point in two-dimensional space. Similarly, if the object to be classified is represented by n features, also referred to as a “feature vector”, it may be represented by a point in n-dimensional space. The simplest form of a support vector machine defines a plane in the n-dimensional space, also known as a hyperplane, which separates feature vector points associated with objects “in a class” from feature vector points associated with objects “not in the class.” For example, referring to FIG. 1, hyperplane 22 separates feature vector points, denoted by circles 28, associated with objects “in a class” from feature vector points, denoted by squares 30, associated with objects “not in a class.” A number of classes can be defined by defining a number of hyperplanes. The hyperplane defined by a trained support vector machine is the plane that maximizes the distance from the plane to the closest points, also referred to as support vectors, “in the class” and “not in the class.” Thus, the hyperplane lies equidistant from the closest points (support vectors) “in the class” and “not in the class.” Referring again to FIG. 1, the “in the class” support vector 24 and the “not in the class” support vector 26 are both located at a distance “d” from the hyperplane 22. The hyperplane that maximizes the distances “d” is sought, because the support vector machine defined by such a hyperplane is robust to input noise.