1. Field of Exemplary Embodiments
Aspects of the invention relate to language modeling, and more particularly to systems and methods which use semantic parse trees for language modeling and confidence measurement.
2. Description of the Related Art
Large vocabulary continuous speech recognition (LVCSR) often employs statistical language modeling techniques to improve recognition performance. Language modeling provides an estimate for the probability of a word sequence (or sentence) P(w1 w2 w3 . . . wN) in a language or a subdomain of a language. A prominent method in statistical language modeling is n-gram language modeling, which is based on estimating the sentence probability by combining probabilities of each word in the context of previous n−1 words.
Although n-gram language models achieve a certain level of performance, they are not optimal. N-grams do not model the long-range dependencies, semantic and syntactic structure of a sentence accurately.
A related problem to modeling semantic information in a sentence is the confidence measurement based on semantic analysis. As the speech recognition output will always be subject to some level of uncertainty, it may be vital to employ some measure that indicates the reliability of the correctness of the hypothesized words. The majority of approaches to confidence annotation methods use two basic steps: (1) generate as many features as possible based on speech recognition and/or a natural language understanding process, (2) use a classifier to combine these features in a reasonable way.
There are a number of overlapping speech recognition based features that are exploited in many studies (see e.g., R. San-Segundo, B. Pellom, K. Hacioglu and W. Ward, “Confidence Measures for Spoken Dialog Systems”, ICASSP-2001, pp. 393-396, Salt Lake City, Utah, May 2001; R. Zhang and A. Rudnicky, “Word Level Confidence Annotation Using Combination of Features”, Eurospeech-2001, Aalborg, Denmark, September, 2002; and C. Pao, P. Schmid and J. Glass, “Confidence Scoring for Speech Understanding Systems”, ICSLP-98, Sydney, Australia, December 1998). For domain independent large vocabulary speech recognition systems, posterior probability based on a word graph is shown to be the single most useful confidence feature (see, F. Wessel, K. Macherey and H. Ney, “A Comparison of Word Graph and N-best List Based Confidence Measures”, pp.1587-1590, ICASSP-2000, Istanbul, Turkey, June 2000). Semantic information can be considered as an additional information source complementing speech recognition information. In many, if not all, of the previous studies the way the semantic information is incorporated into the decision process is rather ad hoc. For example in C. Pao et al., “Confidence Scoring for Speech Understanding Systems”, referenced above, the semantic weights assigned to words are based on heuristics. Similarly, in P. Carpenter, C. Jin, D. Wilson, R. Zhang, D. Bohus and A. Rudnicky, “Is This Conversation on Track”, Eurospeech-2001, pp. 2121-2124, Aalborg, Denmark, September 2001, such semantic features as “uncovered word percentage”, “gap number”, “slot number”, etc. are generated experimentally in an effort to incorporate semantic information into the confidence metric.