The present invention relates to text processing. In particular, the present invention relates to transforming between different forms of text.
In many speech recognition systems, the speech recognition is limited to word sequences defined in a context free grammar. Authoring such grammars can be complex because the author must take into consideration all the different ways that written text can be spoken. For example, the written numbers “123” can be pronounced “one two three”, “one twenty-three”, or “one hundred twenty-three”.
In addition, speech recognizers are designed to provide spoken forms of the words as output. Before displaying these spoken words, it is common to perform an inverse text normalization to convert the spoken form of the word into a written or display form. For example, the words “one two three” would be converted into “123”.
In the past, either hard-coded rules or a context free grammar has been used to perform the inverse text normalization. The hard-coded rules are time-consuming to construct and the context free grammar is very limited in that it can only be used on complete words, and it cannot handle inverse text normalizations in which the order of the symbols in the display text is different than the order in the spoken text. For example, context free grammars of the prior art cannot convert “ten to twelve” into “11:50”.
The context free grammar for performing inverse text normalization under the prior art also only provides one output candidate per input spoken form. Since there is often more than one way to display a spoken word, this limited response is undesirable. In addition, the parsing system used to parse an input text using the context free grammar of the prior art is not as fast as desired.
Text normalization, in which the written form of a word or speech sound is converted into its spoken form has largely been performed by hand as part of forming the context free grammar for the speech recognition engine. As a result, text normalization and inverse text normalization have been treated as separate problems that have been addressed using separate solutions. Thus, the current state of the art has required that two separate systems be built in order to provide both text normalization and inverse text normalization.