The present invention relates generally to speech recognition systems, and in particular, to the handling of compound words in the recognition results of such systems.
Computer recognition of speech is field of great complexity. Speech recognition poses difficult problems in many areas, and while never easy, individual languages have different problems which affect overall recognition successfulness. For several years, speech recognition systems were xe2x80x9cisolated wordxe2x80x9d and required a user to pause between words. With increased computer power available, and ever more sophisticated recognition techniques, commercially available speech recognition systems are now xe2x80x9clarge vocabulary continuousxe2x80x9d in which no pausing is required between words. In fact, such systems are even more accurate when a user does not pause between words, but speaks in multiple word phrases. However, with the advent of large vocabulary continuous speech recognition systems, new problems have emerged which did not exist or were not equally significant in isolated word recognition systems.
One such problem is how to deal with compound wordsxe2x80x94that is, words formed by concatenating component word parts. Some languages, such as German and Dutch, have a relatively high percentage of compound words. As an example, the Dutch word xe2x80x9crentevoetxe2x80x9d is a compound formed of constituent component parts xe2x80x9crentexe2x80x9d and xe2x80x9cvoetxe2x80x9d Such compound words may form a significant fraction of all the xe2x80x9cout of vocabularyxe2x80x9d (OOV) words encountered by a recognition system. However, attempting to include such compound words in the system recognition vocabulary greatly increases the size of the recognition vocabulary.
A preferred embodiment of the present invention provides a postprocessor of a speech recognition system for generating compound words from a recognition result having a sequence of recognized words representative of an input utterance, the sequence including compound word components. The postprocessor has a compound lexicon and a compounder. The compound lexicon contains a plurality of compound words composed of compound word components and connecting links. The compounder replaces, in the sequence of recognized words, adjacent words that have corresponding linked components in the lexicon with a compound word in the compound lexicon composed of the adjacent words.
In a further embodiment, the replacement with the compound word may include an adjustment of the components for agreement for at least one of number, person, gender, and tense, or the addition of component-linking morphemes. The components may have a length greater than or equal to a selected minimum component length. The compound word entries in the compound lexicon may also include an ambiguity indicator field having a value indicating whether the components for a given compound word occur more frequently in compounded or uncompounded form.
In another further embodiment, the compounder may produce an output representing a best recognition hypothesis and at least one alternative recognition hypothesis, such that when the compounder performs a given compound word replacement, one hypothesis is generated that contains the given compound word, and one hypothesis is generated that contains the uncompounded components of the given compound word. In such an embodiment, when the compounder performs a compound word replacement, the best recognition hypothesis either may contain the given compound word, or alternatively, the best recognition hypothesis may contain the uncompounded components of the given compound word.
In an embodiment, the compound lexicon may contain a given compound word only when the components of the given compound word are more likely to occur together in a compound word than to occur as separate words. The compound lexicon also may contain a selected number of most frequently occurring compound words present in a text corpus.
A related embodiment includes an automatic speech recognition system having the postprocessor of one of the above embodiments. The automatic speech recognition system may be a large-vocabulary continuous speech recognition system.
Another preferred embodiment includes a method of a speech recognition system for postprocessing a recognition result having a sequence of recognized words representative of an input utterance so as to generate compound words, the sequence including compound word components. The method includes providing a compound lexicon that contains a plurality of compound words composed of compound word components and connecting links, and replacing, in the sequence of recognized words, adjacent words that have corresponding linked components in the lexicon with a compound word in the compound lexicon composed of the adjacent words.
In a further related embodiment, replacing with the compound word may include adjusting the components for agreement for at least one of number, person, gender, and tense, or adding component-linking morphemes. The components may have a length greater than or equal to a selected minimum component length The compound word entries in the compound lexicon may also include an ambiguity indicator field having a value indicating whether the components for a given compound word occur more frequently in compounded or uncompounded form.
In a further related embodiment, replacing with a compound word may include producing an output representing a best recognition hypothesis and at least one alternative recognition hypothesis, such that when replacing a given compound word, one hypothesis is generated that contains the given compound word, and one hypothesis is generated that contains the uncompounded components of the given compound word. In such an embodiment, when replacing a given compound word, the best recognition hypothesis may contain the given compound word, or alternatively, the best recognition hypothesis may contain the uncompounded components of the given compound word.
In an embodiment, the compound lexicon may contain a given compound word only when the components of the given compound word are more likely to occur together in a compound word than to occur as separate words. Or, the compound lexicon may contain a selected number of most frequently occurring compound words present in a text corpus.
An embodiment also includes an automatic speech recognition system using the method of one of the above embodiments. In such an embodiment, the system may be a large-vocabulary continuous speech recognition system.
Another preferred embodiment includes an automatic speech recognition system having a recognition engine, a compound lexicon and a compounder. The recognition engine generates a recognition result having a sequence of recognized words representative of an input utterance, the sequence including compound word components. The engine uses a recognition vocabulary of words and a language model which, for a given position in the sequence of recognized words and for selected words in the recognition vocabulary, associates an occurrence probability of such word occurring at such position. The compound lexicon contains a plurality of compound words composed of compound word components and connecting links. The compounder replaces, in the sequence of recognized words, adjacent words that have corresponding linked components in the lexicon with a compound word in the compound lexicon composed of the adjacent words. In addition, the language model occurrence probabilities prevent a given component in the lexicon from occurring in the recognition result unless the given component is adjacent to at least one other component such that, for the adjacent components, the lexicon contains linked entries corresponding to a compound word in the lexicon.
In a further related embodiment, the replacement with the compound word may include an adjustment of the components for agreement for at least one of number, person, gender, and tense, or the addition of component-linking morphemes. The compound components may have a length greater than or equal to a selected minimum component length. The compound word entries in the compound lexicon may include an ambiguity indicator field having a value indicating whether the components for a given compound word occur more frequently in compounded or uncompounded form.
In another related embodiment, the compounder may produce an output representing a best recognition hypothesis and at least one alternative recognition hypothesis, such that when the compounder performs a given compound word replacement, one hypothesis is generated that contains the given compound word, and one hypothesis is generated that contains the uncompounded components of the given compound word. In such an embodiment, when the compounder performs a compound word replacement, the best recognition hypothesis may contain the given compound word, or alternatively, the best recognition hypothesis may contain the uncompounded components of the given compound word.
In an embodiment, the compound lexicon may contain a given compound word only when the components of the given compound word are more likely to occur together in a compound word than to occur as separate words. The compound lexicon may contain a selected number of most frequently occurring compound words present in a text corpus. In addition, each compound word may further include an occurrence probability of such compound word occurring at the location of the compound word in the sequence of recognized words. In such case, the occurrence probability of each compound word may be determined based on the occurrence probability of one of the components composing that compound word. The system may be a large-vocabulary continuous speech recognition system.
Another preferred embodiment includes a method of automatic speech recognition. The method includes generating with a recognition engine a recognition result having a sequence of recognized words representative of an input utterance, the sequence including compound word components, the engine using a recognition vocabulary of words and a language model which, for a given position in the sequence of recognized words and for selected words in the recognition vocabulary, associates an occurrence probability of such word occurring at such position, providing a compound lexicon, the lexicon containing a plurality of compound words composed of compound word components and connecting links, and replacing, in the sequence of recognized words, adjacent words that have corresponding linked components in the lexicon with a compound word in the lexicon composed of the adjacent words, wherein the language model occurrence probabilities prevent a given component in the lexicon from occurring in the recognition result unless the given component is adjacent to at least one other component such that, for the adjacent components, the lexicon contains linked entries corresponding to a compound word in the lexicon.
In a further related embodiment, replacing with the compound word may include adjusting the components for agreement for at least one of number, person, gender, and tense, or adding component-linking morphemes. The components may have a length greater than or equal to a selected minimum component length. The compound word entries in the compound lexicon may also include an ambiguity indicator field having a value indicating whether the components for a given compound word occur more frequently in compounded or uncompounded form.
In an embodiment, replacing with a compound word may include producing an output representing a best recognition hypothesis and at least one alternative recognition hypothesis, such that when replacing a given compound word, one hypothesis is generated that contains the given compound word, and one hypothesis is generated that contains the uncompounded components of the given compound word. When replacing a given compound word, the best recognition hypothesis may contain the given compound word, or alternatively, the best recognition hypothesis may contain the uncompounded components of the given compound word.
In an embodiment, the compound lexicon may contain a given compound word only when the components of the given compound word are more likely to occur together in a compound word than to occur as separate words. Each compound word may further include an occurrence probability of such compound word occurring at the location of the compound word in the sequence of recognized words. In such case, the occurrence probability of each compound word may be determined based on the occurrence probability of one of the components composing that compound word. The system may be a large-vocabulary continuous speech recognition system.
Another preferred embodiment includes a method of preparing a speech recognition system to postprocess a recognition result having a sequence of recognized words representative of an input utterance so as to generate compound words, the sequence including compound word components, the system having a language model including n-gram word models. The method includes providing an initial component lexicon, the initial component lexicon containing a plurality of compound word components, comparing a corpus of words modeled by the system to the initial component lexicon, creating entries in a compound lexicon for compound words in the corpus having compound word components in the initial component lexicon, thereby creating a compound lexicon that contains a plurality of compound words composed of compound word components and connecting links, and rewriting the language model n-grams of the compound words so as to form n-grams of the components.
In such an embodiment, providing an initial component lexicon may also include providing a compound grammar specifying how compound word components may be combined into compound words. The compound grammar may use rules based on a part-of-speech characteristic of each component.