Semi-supervised learning algorithms that utilize both labeled and unlabeled data have been used in text classification work in order to minimize the need to label documents by hand. For the task of classification within the realm of natural language call routing applications, a wide range of different statistical classifiers have been proposed in recent years in order to maximize the classification accuracy and thus the call routing accuracy. Common classifier types include decision tree classifiers, NaïveBayes classifiers, expectation maximization and maximum entropic classifiers. The performance of such classifiers typically depends on the amount of training data available.
Additionally, there has been considerable effort in the area of unsupervised and/or semi-supervised adaption of both language models and call category classifiers. In one example, a boosting algorithm is used to improve a classifier iteratively in order to minimize training error. In another example, multiple classifiers, such as support vector machine, maximum entropy and NaïveBayes, are used to automatically annotate unlabeled data. When all classifiers agreed on a label these utterances were added to the training corpus. In another example it was reported that classification rate may be improved with a classification model that solely utilizes automatic speech recognition (ASR) results as input to the training set as compared to small amounts of transcribed utterances. In a last example, a bootstrapping methodology may be used for semi-automatic annotation using support vector machines.
Practical applications such as automated call centers and commercial spoken dialogue systems need ongoing updates to increase caller satisfaction and higher user response percentages. However the above-noted conventional approaches are limited in scope and do not account for ongoing optimization efforts, recursive iteration procedures and increasing accuracy measures. Conventional commercial spoken dialogue systems include applications that are usually updated as part of formal software package releases. Also, the creation of classification grammars in natural language call routing applications requires expensive manual annotation to update the callers' intents and other information extracted from caller selections or utterances.