The present invention generally relates to voice recognition, and in particular to a voice recognition method and an apparatus using the same capable of carrying out both speaker-independent recognition and speaker-dependent recognition. The present invention may be suitably applied to voice response systems such as a voice dialing system and a voice word processing system.
Voice recognition is a well-known technology for identifying an unknown input voice pattern by comparing it with a stored reference pattern and calculating a degree of similarity therebetween. Voice recognition may be classified into two types, i.e., speaker-independent recognition and speaker-dependent recognition. The conventional speaker-independent recognition uses a dictionary designed exclusively for storing reference patterns of spoken words for the speaker-independent recognition. Likewise, the conventional speaker-dependent recognition uses a dictionary designed exclusively for storing reference patterns of words for the speaker-dependent recognition. A reference pattern for the speaker-independent recognition is produced on the basis of voices uttered by a plurality of speakers in order to eliminate a specific pattern depending on an individual. A reference pattern for the speaker-dependent recognition is produced for each individual, and therefore contains a specific pattern inherent in an individual. Therefore, a reference voice pattern of a word for the speaker-independent recognition is different from a reference voice pattern of the same word for the speaker-dependent recognition. That is, both the speaker-independent and speaker-dependent reference patterns for one spoken word are not equivalent to each other. It is to be noted that it is very difficult to produce a dictionary used in common for the speaker-independent and speaker-dependent recognition processes in the current stage of the voice recognition technology.
It is known that there has been proposed a voice recognition apparatus capable of performing both the speaker-independent recognition and the speaker-dependent recognition. Such an apparatus contains two dictionaries; one of which is used for the speaker-independent recognition and the other is used for the speaker-dependent recognition. In this case, a reference voice pattern of a word is stored in either one of the two dictionaries. For example, a voice pattern of a word which may be often used by many persons is registered in the dictionary for the speaker-independent recognition. In operation, when an unknown input voice is supplied to the system, a pattern of the input voice is compared with the reference patterns for the speaker-independent recognition and is also compared with the reference patterns for the speaker-dependent recognition. In this operation, there are obtained degrees of similarity between the input voice pattern and the reference patterns for the speaker-independent recognition, and degrees of similarity between the input voice pattern and the reference patterns for the speaker-dependent recognition. Then, a voice having the highest degree of similarity is selected as a first candidate of the unknown input voice from among the calculated degrees of similarity.
However, the above voice recognition has a disadvantage that an error in recognition often occurs. For example, the apparatus often selects the first candidate having the highest degree of similarity from among the reference patterns stored in the dictionary for the speaker-independent recognition, even when the input voice is uttered by a person whose voice has been registered in the dictionary for the speaker-dependent recognition. Of course, if this is the case, the selected first candidate is incorrect. An incorrect candidate or candidates subsequent to the first candidate may be selected from among the candidates obtained by the recognition type which does not match with the speaker. Alternatively, the apparatus often selects the first candidate from among the registered voices stored in the dictionary for the speaker-dependent recognition, even when the input voice is uttered by speaker whose voices have not been registered in the dictionary for the speaker-dependent recognition. Also, an incorrect candidate or candidates subsequent to the first candidate may be selected. Originally, the degrees of similarity with respect to the speaker-independent recognition and the speaker-dependent recognition cannot be equivalently handled, because there is a difference in the reference pattern and algorithm therebetween.
In addition, the apparatus must contain two different processors; one of which is used exclusively for the speaker-independent recognition and the other of which is used exclusively for the speaker-dependent recognition. Correspondingly, a hardware used for the voice recognition is considerably large.