Statistical speech and language processing systems are used in a variety of industries such as travel, automotive, and financial services. Such systems may receive a spoken utterance from a human user, process the utterance to extract some relevant semantic information, and use the semantic information to perform an action in response to the user's utterance. For example, an interactive voice response system (IVRS) may receive speech input from the user, classify the speech input to understand the intent of the user, and perform one or more actions in response to the user's speech input (e.g., perform a search, execute one or more commands, navigate a website or the internet, route calls, etc., based on the classification of the speech input).
To enable a speech application to perform classification of speech input, the speech application may be trained with a predetermined set of training data that models real-world utterances spoken by users. Such training data can be generated from a sample corpus of speech or from samples of user utterances obtained via an existing/deployed system or application that receives relevant user speech. The sample utterances in the training data are typically grouped or clustered, and each cluster labeled according to one or more similarities or shared traits that are characteristic of the cluster (e.g., labeled according to a shared semantic meaning of the utterances in respective clusters).
In a speech application such as an IVRS, an action may be associated with each cluster as appropriate for the given speech application or system. The resulting labeled clusters may then provide a basis for classifying actual user utterances during operation of the system so that appropriate action may be taken in response to user speech input (e.g., executing a speech command, performing a search, routing calls, or otherwise navigating a speech enabled application based on the classification given to a respective user utterance). That is, speech input may be received by the system and classified, and one or more actions taken based on the classification.
The training process typically involves identifying a desired number of clusters in the training data, locating a cluster center for each data cluster, and labeling the identified clusters with an associated classification. Clustering algorithms generally process a given set of data to identify data clusters in the distribution and determine a characteristic point or cluster center for each cluster (e.g., a cluster mean, centroid or other generally centrally located point of the cluster). Each observation or data point in the training data may be categorized as belonging to the cluster it is nearest by identifying the least distance cluster center for the respective observation.
Ideally, training data is distributed in a given space such that clusters tend to include data having one or more shared relationships with respect to that information type (e.g., the data in each cluster belongs to respective classifications of interest). When identified clusters (e.g., characterized by the cluster center and/or one or more additional cluster attributes) have been established, i.e., fit to the data, the established clusters may be labeled to indicate the corresponding classification associated with the data. The classifications by which clusters are labeled may represent one or more shared relationships, properties or traits of the clustered data that are of interest (e.g., the semantic meaning of user utterances in a speech application), and/or may by labeled with an indication of one or more actions that should be taken responsive to receiving speech of the corresponding classifications.
Thus, clustering is often performed using an algorithm that seeks to identify a predetermined number of clusters in the data and determine a cluster center for each identified cluster. For example, the K-means algorithm partitions a set of data (observations) into k clusters and assigns each observation to the cluster having the closest mean. Accordingly, each of the k clusters may be defined, at least in part, by the mean of the identified cluster of data, and labeled with an identifier indicating the classification that the cluster represents. Each observation in the data may also be labeled according to the cluster mean to which it is most closely located.
Another clustering method uses Gaussian Mixture Models (GMMs) to model a given set of data by fitting a designated number of Gaussians to the data. An expectation/maximization (EM) algorithm may be used to identify at least the means and standard deviations of the k GMMs that optimally fit the data. After fitting the GMMs to the data, each GMM may be labeled with one of a desired number of classifications corresponding to the type of data in the cluster the respective GMM is fit to. Other algorithms are also available that generally seek to locate the center of clusters of data and optionally to associate data with the most proximately located cluster center.
Established and labeled clusters may then be used to classify new data (e.g., data not necessarily in the training data) as belonging to one of the labeled clusters based on similarity (e.g., based on which cluster the new data is closest to in a given feature space). For example, the new data may be compared to each of the labeled clusters (e.g., the center or mean of a cluster) to evaluate which cluster the new data is nearest. The new data may then be labeled according to the cluster it most closely corresponds to from a proximity standpoint and a desired action may be performed based on the classification of the new data.