Speech recognition and synthesis technologies often use text normalization techniques to create a smaller set of vocabulary from which language may be processed. By using a smaller vocabulary, a speech recognition or speech synthesis system may operate in a faster, more efficient manner.
Some text normalization techniques include conversion of symbols and digits. Such conversion may be performed by rules, such as converting the symbol “#” to the word “number”. Another technique may involve homonyms, such as converting the tradenames “Lowe's” and “Loews” to “Lows”, as defined in a dictionary. Still another technique may involve breaking a word into common pre- and post-fixes, as defined in a dictionary.
The text normalization techniques allow a more consistent and smaller set of vocabulary. In one use, a spoken version of the name “Allen” may be converted through text normalization dictionary to include both “Allen” and “Alan”. Subsequent processing, such as performing a search using the spoken input, would search for all homonyms of “Alan”, including “Allen”.