The present invention generally relates to data processing. The invention relates more specifically to speech recognition systems.
Speech recognition systems are specialized computer systems that are configured to process and recognize spoken human speech, and take action or carry out further processing according to the speech that is recognized. Such systems are now widely used in a variety of applications including airline reservations, auto attendants, order entry, etc. Generally the systems comprise either computer hardware or computer software, or a combination.
Speech recognition systems typically operate by receiving an acoustic signal, which is an electronic signal or set of data that represents the acoustic energy received at a transducer from a spoken utterance. The systems then try to find a sequence of text characters (xe2x80x9cword stringxe2x80x9d) which maximizes the following probability:
P(A|W)*P(W)
where A means the acoustic signal and W means a given word string. The P(A|W) component is called the acoustic model and P(W) is called the language model.
A speech recognizer may be improved by changing the acoustic model or the language model, or by changing both. The language may be word-based or may have a xe2x80x9csemantic model,xe2x80x9d which is a particular way to derive P(W).
Typically, language models are trained by obtaining a large number of utterances from the particular application under development, and providing these utterances to a language model training program which produces a word-based language model that can estimate P(W) for any given word string. Examples of these include bigram models, trigram language models, or more generally, n-gram language models.
In a sequence of words in an utterance, W0xe2x88x92Wm, an n-gram language model estimates the probability that the utterance is word j given the previous nxe2x88x921 words. Thus, in a trigram, P(Wj|utterance) is estimated by P(Wj|Wjxe2x88x921, Wjxe2x88x922). The n-gram type of language model may be viewed as relatively static with respect to the application environment. For example, static n-gram language models cannot change their behavior based upon the particular application in which the speech recognizer is being used or external factual information about the application. Thus, in this field there is an acute need for an improved speech recognizer that can adapt to the particular application in which it is used.
An n-gram language model, and other word-based language models work well in applications that have a large amount of training utterances and the language model does not change over time. Thus, for applications in which large amounts of training data are not available, or where the underlying language model does change over time, there is a need for an improved speech recognizer that can produce more accurate results by taking into account application-specific information.
Other needs and objects will become apparent from the following detailed description.
The foregoing needs, and other needs and objects that will become apparent from the following description, are achieved by the present invention, which comprises, in one aspect, a method of dynamically modifying one or more probability values associated with word strings recognized by a speech recognizer based on semantic values represented by keyword-value pairs derived from the word strings, comprising the steps of creating and storing one or more rules that define a change in one or more of the probability values when a semantic value matches a pre-determined semantic tag, in which the rules are based on one or more external conditions about the context in which the speech recognizer is used; determining whether one of the conditions currently is true, and if so, modifying one or more of the probability values that match the tag that is associated with the condition that is true.
According to one feature, the speech recognizer delivers the word strings to an application program. The determining step involves determining, in the application program, whether one of the conditions currently is true, and if so, instructing the speech recognizer to modify one or more of the probability values of a word string associated with a semantic value that matches the tag that is associated with the condition that is true.
Another feature involves representing the semantic values as one or more keyword-value pairs that are associated with the word strings recognized by the speech recognizer; delivering the keyword-value pairs to an application program; and determining, in the application program, whether one of the conditions currently is true, and if so, instructing the speech recognizer to modify the probability value of the word strings that are associated with the keyword-value pairs that match the tag that is associated with the condition that is true.
Yet another feature involves delivering the words and semantic values to an application program that is logically coupled to the speech recognizer; creating and storing, in association with the speech recognizer, a function callable by the application program that can modify one or more of the probability values of the word strings associated with semantic values that match the tag that is associated with the condition that is true; determining, in the application program, whether one of the conditions currently is true, and if so, calling the function with parameter values that identify how to modify one or more of the semantic values.
A related feature involves re-ordering the word strings after modifying one or more of the probability values. Another feature is modifying the probability values by multiplying one or more of the probability values by a scaling factor that is associated with the condition that is true.
In another feature, the method involves delivering one or more word-value pairs that include the semantic values to an application program that is logically coupled to the speech recognizer. A function is created and stored, in association with the speech recognizer, which can modify one or more of the probability values of word strings associated with words of word-value pairs that match the tag word that is associated with the condition that is true. It is determined, in the application program, whether one of the conditions currently is true, and if so, calling the function with parameter values that identify how to modify a probability value of a word string associated with the semantic values, including a scaling factor that is associated with the condition that is true. The function may modify a probability value by multiplying the probability value by the scaling factor.
The invention also encompasses a computer-readable medium and apparatus that may be configured to carry out the foregoing steps.