Generally, In order to recognize a user only based upon voice information, a speaker recognition method based on voice signal processing and pattern matching may be used. The speaker recognition is to distinguish who is speaking, namely, whose voice it is. Such speaker recognition is required in various fields. For example, such speaker recognition is required in a speaker authentication system using voice information, a voice extracting system, a speaker recognition and information extracting system using a voice signal from online multi-party communications, a real-time speaker tracking system using multi-modal information and the like.
A user-recognizing method based upon the related speaker recognition is implemented by collecting voice data of target users, extracting feature vectors therefrom, and creating a statistical model for each user using the feature vectors. A GMM (Gaussian Mixture Model) which is the statistical model using the feature vector is widely used as a model for each user.
As such, previously created models are used for user recognition. When receiving a voice signal of a user, a feature vector for this voice signal is compared with user models, and accordingly a user whose the feature tag is the most probably similar to the feature vector from the user models is selected using a maximum likelihood method to thusly be identified as the user of the voice inputted.
However, in case of using the general speaker recognition, voice information longer than at least 2˜4 seconds is required to obtain about over 90% accuracy.
Such method may cause user's inconvenience when it is applied to the user recognition of a robot. That is, upon identifying a user only using a user's voice as short as one-word length, for example, the lack of absolutely needed data amount may cause a decrease in performance of the user recognition. Also voice information longer than at least 2˜4 seconds required for over 90% accuracy makes the execution speed slow.