1. Field of the Invention
The present invention relates to voice recognition, and more particularly, to a method and apparatus for improving voice recognition performance in a mobile device.
2. Description of the Related Art
Recently, mobile devices, such as mobile phones and personal digital assistants (PDAs), are being made smaller, while the usage of memories is increasing. In addition, the number of telephone numbers that can be stored in a mobile device is continuously increasing from hundreds to thousands. An ordinary user stores all telephone numbers of acquaintances in the mobile device. In order to search for a telephone number, or to make a call, keys can be used, thereby finding the telephone number. Also, voice can be used to find the telephone number. A method of automatically dialing a telephone number in a mobile device by uttering an already registered name to the mobile device is referred to as name dialing or voice dialing. In order for a user of a mobile device to effectively use name dialing, it is essential that the voice recognition performance of the mobile device should be high.
Meanwhile, much research to improve recognition of a voice of a specific user by a device using voice recognition has been carried out. Most of this research has employed speaker adaptation, and applied a variety of techniques capable of being adapted to an acoustic model of a specific user. These methods can be broken down into a maximum a posterior (MAP) method and a maximum likelihood linear regression (MLLR) method, and methods capable of achieving high performance using just a small amount of adaptation data have been suggested. However, these methods require much computation and large memories, and thus cannot be applied easily.
FIG. 1 is a schematic block diagram illustrating a voice recognition apparatus according to conventional technology.
Referring to FIG. 1, the voice recognition apparatus 100 includes a feature extraction unit 110 extracting a feature vector from a voice sample corresponding to a user's utterance converted into a digital signal, a voice interval detection unit 120 detecting the start point and the end point of the user's utterance, a matching unit 130 matching an obtained feature vector with voice models stored in a voice model unit 140 if the start point of the voice is detected, and a determination unit 150 determining whether to accept or refuse the result of matching.
The voice recognition apparatus 100 illustrated in FIG. 1 temporarily stores a voice recognition result and a feature vector which is calculated when the voice is recognized. Then, by using the pattern of a user's manipulation of a device, the voice recognition apparatus 100 determines whether or not the result is reliable, and then, uses the result for acoustic model adaptation.
Also, like the voice recognition apparatus 100 illustrated in FIG. 1 and described above, U.S. Pat. No. 7,050,550 filed by Philips Corporation, titled, “Method for the training or adaptation of a speech recognition device”, uses an acoustic adaptation method.
FIG. 2 is a schematic block diagram illustrating a voice recognition apparatus using analysis of usage patterns by user according to conventional technology.
Referring to FIG. 2, the voice recognition apparatus 200 using usage analysis of usage patterns by user includes a preprocessing unit 210 analyzing a caller's telephone number, thereby loading a personal name management (PNM) database DB 250 corresponding to the telephone number, a recognition unit 220 recognizing an uttered voice of the caller and selecting a recognition result (n-best) corresponding to the recognized word, a recognition word selection unit 230 readjusting the result n-best by using the PNM DB 250 and a recognition word selection rule, and a PNM DB management unit 240 analyzing usage patterns by caller in which the number of recognition words being used is limited, and managing the PNM DB 250 appropriately to the characteristic of each caller so that according to whether recognition is successful or not, a name is registered or deleted in an exclusion list in the PNM DB 250, and data of recognition success and failure in relation to each caller telephone number is stored and managed in the PNM DB 250.
According to the method, a list of words that are frequently mistakenly recognized in the vocabulary that is the object of recognition in relation to each user is managed and the words that were mistakenly recognized previously in the result of voice recognition are excluded.
However, this method has a drawback in that the user is continuously asked whether or not the result is correct by using voice synthesis. That is, this method requires a user's feedback in order to update information. Also, the method cannot predict whom the user will mainly make a call to, and apply the prediction result, and only words that were previously mistakenly recognized can be deleted from the vocabulary that is the object of voice recognition.
Meanwhile, in the mobile device field, enhancing the performance of voice recognition by using the conventional speaker adaptation method, that is, the method of adapting mainly an acoustic model to the characteristic of a user requires a huge amount of computation and also requires a large memory. Furthermore, if speaker adaptation is performed by using a mistakenly recognized result, the performance can lower rapidly. Accordingly, in an environment in which resources are limited, such as a mobile device environment, it is difficult to use the speaker adaptation methods using an acoustic model according to conventional technology.