Conventionally, for speech recognition performed for a specific language, such as Japanese, when a user registers a word that can be identified, for the word inscription,    1. the sounds-like spelling of the word (includes use of Kanji and alphabetical characters) is entered; and    2. a plurality of base forms (pronunciations) expected from the sounds-like spelling are compared with the user's pronunciation of the word, and the base form receiving the highest evaluation, one exceeding a predetermined threshold value, is adopted and is registered in the speech recognition dictionary.
In order to reduce the work that is required for a user during the registration phase, the key strokes used for a kana/kanji entry may be obtained to reduce the labor required to input the sounds-like spelling. However, when kana/kanji is not employed, or when, as in an English word, the sounds-like spelling can not be conveyed by entering a key stroke, the above method can not be used.
Further, in many cases in Japanese, the reading (of kana) does not have a one to one correspondence with the pronunciation, and if base forms are not selected in accordance with information acquired from the actual pronunciation of words, a high speech recognition accuracy can not be maintained. For example, in Japanese, a plurality of pronunciations may be applied for a single reading. In English, such readings do not exist, but when the spelling is used as a replacement for Japanese reading, the word “vase,” for example, has two pronunciations: “va-z” and “veis.” As another example, a different base form must be prepared even for the same sounds-like spelling; Chinese characters coded 312 of FIG. 6 mean a calf and Chinese characters coded 314 of FIG. 6 mean a lecturer. Both of them can be shown the same reading by printing “Kana” 316. However, a pronunciation of 312 is “koushi” but a pronunciation of 314 is “koo:shi.”
As is shown in FIGS. 13 and 14, according to conventional speech recognition software, a word 501 to be registered is specified, the sounds-like spelling and pronunciation of the word 501 are entered in fields 507 and 509 of an input panel 500, and the actual pronunciation of the word 501 is thereafter obtained while a recording button 503 is depressed. In this manner, for speech recognition, the word is registered in a speech recognition dictionary.
The voice information that is entered is compared with each of a plurality of corresponding sounds-like spellings, and a check is performed to determine whether the value of the highest evaluation for a base form exceeds a predetermined threshold value. If the value of the highest evaluation for the base form exceeds the predetermined threshold value, the pertinent base form is registered in the speech recognition dictionary, with the word 501, the sounds-like spelling 507 and the pronunciation 509.
When the value of the highest evaluation for the base form does not exceed the predetermined threshold value, a panel 520 is displayed that requests a user to again enter the pronunciation of the word 501, and based on the voice information that is input, another check is performed to determine whether the value of the evaluation for the pertinent base form exceeds the predetermined threshold value. This process must be repeated until the value of the evaluation for the pertinent base form exceeds the predetermined threshold value, and this is the source of much trouble for a user.