1. Field of the Invention
The present invention relates to a speech-recognition device.
2. Description of the Related Art
Generally speaking, in a speech-recognition device, in order to enable recognition of speeches produced by a dependent speaker, the dependent speaker is caused to pronounce a predetermined word or the like so that a dictionary used for recognizing speeches produced by the dependent speaker is built. In such a case, in order to build a dictionary used for recognizing speeches produced by a dependent speaker, the dependent speaker needs to pronounce a word or the like once through thrice.
When a dependent speaker pronounces a word or the like only once, a burden to be borne by the speaker is relatively light. However, a good dictionary may not be built due to the environment (for example, background noise, and/or speeches of surrounding persons) at the time of registration of the dictionary. This is because the surrounding sound is mixed to a speech produced by the dependent speaker. As a result, the quality of the thus-registered dictionary is degraded.
In contrast to this, when a dictionary is built (registered) in a condition in which a dependent speaker produces a speech a plurality of times (for example, thrice), it is possible to build an average dictionary based on the speeches obtained from the plurality of times of pronunciations. Alternatively, it is possible that a dictionary is built using the first-produced speech, and, then, using the second or third-produced speech, matching against the dictionary is performed so that the quality of the dictionary is evaluated. In any case, a good dictionary can be built in comparison to the case where a dependent speaker produces a speech only once.
However, when a dependent speaker is caused to pronounce the same word twice or thrice repeatedly, to build a dictionary is a burden to the person. For example, when 20 words through 30 words are registered with the dictionary, to build the dictionary is a very heavy burden to the person.
An object of the present invention is to provide a speech-recognition device in which a good dictionary to be used for recognizing speeches produced by a dependent speaker can be built without burdening the dependent speaker much.
In order to achieve the above-mentioned object, a device for speech recognition, according to the present invention, comprises:
a standard dictionary;
a feature extracting unit which extracts features from an input speech;
a matching unit which performs matching of the features of the input speech extracted by the feature extracting unit against the standard dictionary;
a result outputting unit which outputs a matching result in the matching unit; and
a dictionary updating portion which updates the standard dictionary, wherein:
the standard dictionary is built initially as a dictionary to be used for recognizing speeches produced by any independent speaker; and
the dictionary updating unit updates the standard dictionary so as to provide a dictionary to be used for recognizing speeches produced by a dependent speaker based on the result of matching of the features extracted from the input speech against the standard dictionary.
The standard dictionary may be built initially as a dictionary to be used for recognizing speeches produced by any independent speaker as a result of standard features of each string of characters being disintegrated into phoneme units, the-thus-obtained features of the respective phonemes being used as phoneme information, and the connection of the phonemes being used as path information;
the matching unit, when comparing features of input phonemes determined from the features extracted from the input speech for a string of characters with the phoneme information in the standard dictionary corresponding to the string of characters, may perform evaluation of phoneme distance between the features of the input phonemes and the phoneme information in the standard dictionary corresponding to the string of characters; and
the dictionary updating unit, based on the result of the evaluation of phoneme distance, may update the phoneme information in the standard dictionary corresponding to the string of characters, and, thus, update the standard dictionary so as to provide a dictionary to be used for recognizing speeches produced by a dependent speaker.
The dictionary updating unit may update the phoneme information in the standard dictionary corresponding to the string of characters, and, thus, update the standard, only when the phoneme distance between the features of the input phonemes and the phoneme information in the standard dictionary corresponding to the string of characters exceeds a predetermined threshold as a result of the evaluation of phoneme distance.
The dictionary updating unit may update the phoneme information in the standard dictionary corresponding to the vowels of the string of characters, and, thus, update the standard, only when the phoneme distance between the features of the input phonemes and the phoneme information in the standard dictionary corresponding to the string of characters exceeds a predetermined threshold as a result of the evaluation of phoneme distance.
Thus, according to the present invention, a standard dictionary; a feature extracting unit which extracts features from an input speech; a matching unit which performs matching of the features of the input speech extracted by the feature extracting unit against the standard dictionary; a result outputting unit which outputs a matching result in the matching unit; and a dictionary updating portion which updates the standard dictionary are provided. The standard dictionary is built initially as a dictionary to be used for recognizing speeches produced by any independent speaker; and the dictionary updating unit updates the standard dictionary so as to provide a dictionary to be used for recognizing speeches produced by a dependent speaker based on the result of matching of the features extracted from the input speech against the standard dictionary. Thereby, it is possible to remarkably ease the burden to be borne by a dependent speaker needed for producing a dictionary to be used for recognizing speeches produced by the dependent speaker. Further, because it is possible to build a dictionary, to be used for recognizing speeches produced by a dependent speaker, using information in a dictionary to be used for recognizing speeches produced by any independent speaker, it is possible to provide a high-performance, superior-user-interface speech-recognition device
Especially, in the arrangement in which only the phoneme information in a dictionary corresponding to vowels of a string of characters is updated, it is possible to remarkably improve the performance of the dictionary, considering that information to be updated is a little. As a result, it is possible to reduce the size of, and, also to improve the performance of the speech-recognition device.