1. Field of the Invention
The present invention relates to a speech recognition apparatus, a method of speech recognition, and a computer program product for speech recognition, according to which a character string in speech input is recognized.
2. Description of the Related Art
Conventionally, techniques for speech recognition are developed to realize conversion of speech information into textual information through pattern collation between a speech utterance and previously stored information on speech analysis. The currently available speech recognition techniques are not completely immune to errors in recognition. To offset this inconvenience, various techniques are proposed and widely utilized to enhance precision of speech recognition.
One conventional technique, for example, enables an efficient acquisition of appropriate results in speech recognition by; selecting a most likely recognition candidate among plural recognition candidates and presenting the same to the user, allowing the user to re-input an entire utterance if the selected recognition candidate is not correct, and excluding the rejected recognition candidate from further presentation. Such technique, however, increases the operational load for the user because it requires the re-input of the whole utterance.
On the other hand, Japanese Patent Application Laid-open (JP-A) No. 2002-287792 discloses a technique, according to which when the selected recognition candidate is wrong, only a portion that is not properly recognized is re-input by the user. Then the recognition candidate is corrected based on the re-input utterance and the corrected candidate is shown to the user again. According to this technique, since the user does not need to re-utter the entire sentence, the load on the user can be alleviated and the operability of the apparatus increases.
In JP-A No. 2002-287792, the supplied information is assumed to have a hierarchical structure as in addresses or telephone numbers. At the correction of the recognition candidate, the level of the re-input utterance in the hierarchy is determined, and the correction is carried out based on the determination. Here, only the patterns of the level of the erroneously recognized candidate may be selected as the targets of collation. Hence, a more efficient and highly precise recognition process can be realized.
Further, JP-A No. 2003-316386 proposes a technique to allow for the user to re-input the utterance which corresponds only to the erroneously recognized portion, and to delete the recognition candidate selected at the previous utterance from the recognition candidates for the re-input, thereby avoiding selecting and presenting the same erroneous candidate to the user.
In general, in a speech recognition system which receives and recognizes a phrase or a sentence, the erroneous recognition may occur in two patterns: firstly, only some words may be erroneously recognized; secondly, a burst error may occur, i.e., a whole utterance may be erroneously recognized due to an influence of noise or the like. When erroneously recognized portions are small in number and the error is minor, it is efficient that only the pertinent portions are corrected. On the other hand, when portions to be corrected are large in number, as in the case of the burst error, it is efficient that the entire utterance is corrected.
The conventional techniques, however, basically realizes only one of the two types of the error correction in speech recognition, i.e., the re-input for the correction of the whole uttered sentence, or the re-input for the correction of a part of the utterance. Thus, the manner of correction cannot be flexibly selected according to the type of the error at the recognition.
Meanwhile, the technique disclosed in the JP-A No. 2003-316386 can be applied to both the entire correction and the partial correction. However, this technique allows for only one manner of correction for each of the entire correction and the partial correction, and hence the correction cannot be performed flexibly according to the manner of re-input by the user, i.e., whether the user re-inputs the whole utterance or a part of the utterance.