1. Field of the Invention
The present invention relates to a speech recognition technique.
2. Description of the Related Art
Speech recognition includes that of text registration type, and that of speech registration type. In speech recognition of text registration type, text registered as a speech recognition target word (speech recognition candidate) is converted into a phoneme sequence, and an acoustic model sequence corresponding to the converted phoneme sequence is used in recognition processing. By contrast, in speech recognition of speech registration type, acoustic parameters such as a cepstrum and the like are extracted from speech recorded as a speech recognition target word by signal processing. Alternatively, speech recorded as a speech recognition target word undergoes phoneme recognition or model sequence matching to obtain a phoneme sequence or model sequence, whichever is best for expresses that speech. The obtained acoustic parameters, phoneme sequence, or model sequence are used in recognition processing.
Upon execution of speech recognition, a recognition result is often presented (output) to the user to allow the user to confirm whether recognition has been successful.
In the case of speech recognition of text registration type, the registered text is normally output as information used to confirm the recognition result. On the other hand, in speech recognition of speech registration type, the speech upon registration is output as information used to confirm the recognition result.
As described above, in speech recognition of speech registration type, speech upon registration is output for the purpose of confirmation of the recognition result. However, the speech upon registration is not the one in an ideal environment such as a soundproof room, but is the one in an actual environment where a speech recognition apparatus is operated. That is, the speech registered as speech recognition target word includes background noise and the like. At the time of speech registration, the user does not always start utterance immediately, and recording does not always end as soon as the utterance ends. Hence, unwanted silent periods are often added before and after the registered speech.
For this reason, in speech recognition of speech registration type, the speech output for the purpose of confirmation of the speech recognition result is hard to hear.