Speech recognition (SR) engines for the English language need spelling wizards mainly because of the inability to include all or substantially all of the proper nouns, particularly names in the engine lexicon. On the other hand, the set of Chinese characters can be considered closed since all the characters are included in the lexicon. There is no need to consider or to worry about out-of-vocabulary characters. However, the major problems in a Chinese SR engine are the homophone sharing across many different characters. There are about 47,000 valid Chinese characters, but there are only about 1,600 different but fixed syllables in the Chinese language. This means that if the syllables are evenly distributed across the different characters, each syllable can correspond to about 23-31 different characters, many of which can mean different things.
Fixed syllables are the set of syllables associated with valid Chinese characters. For example, the syllable “chu” corresponds to characters like  (/ch uh/ in Pinyin), but there is no “chiu” that corresponds to any valid characters. Because of the limited number of the fixed syllables, there are a significant number of characters that share the same pronunciation. Following is one example of fifty-four characters sharing the same pronunciation as /l ih/; and, the list for /l ih/ is still not comprehensive:  
Consequently, when the engine fails to recognize a word correctly, users can try to correct it from an alternate list, or try to voice the desired word repeatedly for recognition, without success, due to the following problems.
First, if the voice audio is not processed correctly by the acoustic model (AM) or the AM has a lower relevance score for the desired word than other words, while other words have higher language model (LM) scores, then irrespective of how many times the user voices the word, the output may not be the correct word from the alternate list.
Second, assume that the voiced audio is processed correctly by the AM, but if the desired character is in the list entry where the number of alternate items is exceeded and not presented to the user, then users will not be able to obtain the word without typing. This is likely to happen in Chinese, especially when the characters are also homophones of digits or numbers where the SR engine also displays different formats of ITN (Inversed-Text Normalization, such as normalizing “twelve” to “12”) results for numbers.
Third, even though there are no out-of-vocabulary characters to consider for the Chinese SR engine, new words can be created by users though the combination of different characters. In addition to the new words, there are no spaces in between words to mark the word boundary. To determine a word boundary, Asian languages (at least related to Simplified Chinese (CHS), Traditional Chinese (CHT), and Japanese (JPN)) require word-breaking in the engine or IME (input method editor) process. Consequently, when a user dictates a proper noun such as a personal name to the Chinese SR engine, which is very likely to be an unknown word, the likelihood is very low that the SR will process the name correctly, unless the name is very common and appears in the training data. Even if the AM and LM are working perfectly, users may still receive an output name with characters such as  (the focus being on the second character, where the first character is a family name and the second character is a first name), which is different from the desired output of  because of the homophone issue described earlier, i.e.,  and  are homophone but are used as first names of different persons. This is also true for human perception during Chinese conversation as well, where a first person tells a second person his name, and the second person will need to ask the first person exactly which characters are used in the name.
Lastly, when users try to correct characters in a word during the SR process, it may be possible to obtain the correct output by selecting the character(s) and/or voicing the character(s) repeatedly to determine if the correct character will eventually appear on the alternate list. Oftentimes, when the characters do not have many homophones, it is possible to obtain the correct word for replacement. However, this kind of correction will be done on the basis of individual characters. The correction will not be picked up in the SR because SR learns by the word, not by single characters. Consequently, if the user wants this character several times in a document, the user will need to repeat the correction process each time the character is spoken. Thus, conventional recognition processes are cumbersome and inefficient.