Many types of data processing systems process large bodies of stored data items. For example, contact centers (call centers) store and process large volumes of customer interaction sessions. An exemplary call recording and processing system called ULTRA is produced by Verint® Systems Inc. (Melville, N.Y.). The ULTRA system suite includes a component called Intellifind, which performs speech analytics in response to user queries. Further details regarding the ULTRA and Intellifind products can be found at www.verint.com/contact_center/index.cfm.
The processing of recorded customer sessions sometimes involves classifying the sessions into categories. In some applications, recorded calls made by customers are categorized in order to determine the reasons (“root causes”) that caused the calls. Exemplary root cause categorization methods, which are carried out by the ULTRA system, are described in a paper published by Verint Systems, entitled “The Power of Why—Using Root Cause Analysis to Drive Superior Performance,” January, 2007, which is incorporated herein by reference.
Other types of systems that process large corpora of data items can be found in the field of communication interception and analysis, in which large numbers of communication sessions are intercepted, recorded and analyzed. For example, Verint System offers several systems and solutions for intercepting, filtering and analyzing wireline and wireless, cable and satellite, Internet, multimedia, and Voice over IP communication links. Details regarding these products can be found at www.verint.com/communications_interception.
The processing of data items sometimes involves clustering, i.e., grouping data items into clusters. Typically, a clustering process attempts to group the data items so that data items within a cluster are similar in a certain respect and data items in different clusters are dissimilar. Various automatic clustering processes are known in the art. Exemplary clustering methods are described by Goldszmidt and Sahami in “A Probabilistic Approach to Full-Text Document Clustering,” SRI International Technical Report ITAD-433-MS-98-044, 1998; by Slonim and Tishby in “Document Clustering using Word Clusters via the Information Bottleneck Method,” Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'00), Athens, Greece, Jul. 24-28, 2000, pages 208-215; by Pantel and Lin in “Document Clustering with Committees,” Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'02), Tampere, Finland, Aug. 11-15, 2002, pages 199-206; and by Dhillon in “Co-clustering Documents and Words using Bipartite Spectral Graph Partitioning,” Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2001), San Francisco, Calif., Aug. 26-29, 2001, pages 269-274, all of which are incorporated herein by reference.
As yet another example, U.S. Patent Application Publication 2004/0163035, whose disclosure is incorporated herein by reference, describes a method for processing non-deterministic text. The method utilizes non-textual differences between words, or sequences of words, in the text to provide useful information to users by resolving more than two decision options.