1. Field of the Invention
The present invention relates generally to customer self service systems for resource search and selection, and, more specifically, to a mechanism for applying clustering (an unsupervised machine learning technique) to a set of user interaction records to discover groups of similarly situated queries, with each discovered group comprising a possible new user context.
2. Discussion of the Prior Art
Currently there exist many systems designed to perform search and retrieval functions. These systems may be classified variously as knowledge management systems, information portals, search engines, data miners, etc. However, providing effective customer self service systems for resource search and selection presents several significant challenges. The first challenge for current systems with query capability is that serving queries intelligently requires a large amount of user supplied contextual information, while at the same time the user has limited time, patience, ability and interest to provide it. The second challenge is that searching without sufficient context results in a very inefficient search (both user time and system resource intensive) with frequently disappointing results (overwhelming amount of information, high percentage of irrelevant information). The third challenge is that much of a user's actual use and satisfaction with search results differs from that defined at the start of the search: either because the users behave contrary to their own specifications, or because there are other contextual issues at play that have not been defined into the search.
While the prior art has addressed the use of unsupervised clustering to identify user context attributes, a major limitation of these approaches, however, is that they have not based the clustering on a rich set of context identifiers. Consequently, the identified user contexts have not substantially improved the relevance of the query results based on selection of these contexts by users. Further, most of the prior art is focused on the discovery of database structure, or clustering of data within the resources, or discovering relevant taxonomy for resources as opposed to the clustering of contexts among users and user groups which can be used predictively.
As will be hereinafter explained in greater detail, some representative prior art search and retrieval systems include Feldman, Susan, “The Answering Machine,” in Searcher: The Magazine for Database Professionals, 1, 8, Jan. 2000/58; U.S. Pat. No. 5,754,939 entitled “System for Generation of User Profiles For a System For Customized Electronic Identification of Desirable Objects”; U.S. Pat. No. 5,794,178 entitled “Visualization of Information Using Graphical Representations of Context Vector Based Relationships and Attributes”; U.S. Pat. No. 5,999,927 entitled “Method and Apparatus for Information Access Employing Overlapping Clusters”; U.S. Pat. No. 5,619,709 entitled “System and Method of Context Vector Generation and Retrieval”; U.S. Pat. No. 5,787,422 entitled “Method and Apparatus for Information Access Employing Overlapping Clusters.”
For example, the article by Feldman, Susan entitled “The Answer Machine,” discusses generally how the use of learning may make systems dynamic, however, the systems related to learning appear to be focused on learning a taxonomy or relationships among document categories or topics. Such learning systems may detect the rise of new terms. For example, the Semio system (http://www.semio.com/products/semiotaxonomy.html) creates taxonomies or hierarchies automatically. However, none of the systems for learning in the prior art are focused on or uses user contexts. Moreover, no system in the prior art is directed to discovering clusters in user behaviors (user context clusters).
U.S. Pat. No. 5,754,939 describes a method for customized electronic identification of desirable objects, such as news articles, in an electronic media environment, and in particular to a system that automatically constructs both a “target profile” for each target object in the electronic media based, for example, on the frequency with which each word appears in an article relative to its overall frequency of use in all articles, as well as a “target profile interest summary” for each user, which target profile interest summary describes the user's interest level in various types of target objects. The system then evaluates the target profiles against the users' target profile interest summaries to generate a user-customized rank ordered listing of target objects most likely to be of interest to each user so that the user can select from among these potentially relevant target objects, which were automatically selected by this system from the plethora of target objects that are profiled on the electronic media.
U.S. Pat. No. 5,794,178 describes a system and method for automatically generating context vectors representing conceptual relationships among information items by quantitative means for use in storage and retrieval of documents and other information items and for displaying them visually to a user. A neural network operates on a training corpus of records to develop relationship-based context vectors based on word proximity and co-importance using a technique of “windowed co-occurrence”. Relationships among context vectors are deterministic, so that a context vector set has one logical solution, although it may have a plurality of physical solutions. No human knowledge, knowledge base, or conceptual hierarchy, is required. Summary vectors of records may be clustered to reduce searching time, by forming a tree of clustered nodes. Once the context vectors are determined, records may be retrieved using a query interface that allows a user to specify content terms, Boolean terms, and/or document feedback. Thus, context vectors are translated into visual and graphical representations to thereby provide user visualization of textual information and enable visual representations of meaning so that users may apply human pattern recognition skills to document searches.
U.S. Pat. No. 5,999,927 describes a method and apparatus for document clustering-based browsing of a corpus of documents, and more particularly to the use of overlapping clusters to improve recall. More particularly, it is directed to improving the performance of information access methods and apparatus through the use of non-disjoint (overlapped) clustering operations.
U.S. Pat. No. 5,619,709 is directed to a system and method for generating context vectors for use in storage and retrieval of documents and other information items. Context vectors represent conceptual relationships among information items by quantitative means. A neural network operates on a training corpus of records to develop relationship-based context vectors based on word proximity and co-importance using a technique of “windowed co-occurrence”. Relationships among context vectors are deterministic, so that a context vector set has one logical solution, although it may have a plurality of physical solutions. No human knowledge, thesaurus, synonym list, knowledge base, or conceptual hierarchy, is required. Summary vectors of records may be clustered to reduce searching time, by forming a tree of clustered nodes. Once the context vectors are determined, records may be retrieved using a query interface that allows a user to specify content terms, Boolean terms, and/or document feedback. This system further facilitates visualization of textual information by translating context vectors into visual and graphical representations. Thus, a user can explore visual representations of meaning, and can apply human visual pattern recognition skills to document searches. However, no teaching is provided for finding and clustering user contexts.
U.S. Pat. No. 5,787,422 is directed to a method and apparatus for document clustering-based browsing of a corpus of documents, and more particularly to the use of overlapping clusters to improve recall. The present invention is directed to improving the performance of information access methods and apparatus through the use of non-disjoint (overlapped) clustering operations.
It would be highly desirable to provide for a customer self service system an automatic clustering process that discovers related queries and enables the generation of new relevant context terms and corresponding icons used to describe the users and their interactive situations.
It would also be highly desirable to suggest new contexts for the administrator to label and build an icon for, thus providing a semi-automated process involving explicit administrator control.