1. Field of the Invention
The invention generally relates to speech recognition devices, and is particularly concerned with improving recognition rates thereof through use of hybrid speaker-specific and non-speaker specific phrase matching and normalization techniques.
2. Description of the Related Art
Speech recognition devices can be generally classified into two types. The first type is the specific-speaker speech recognition device that only recognizes the speech of a specific speaker, and the second type is the non-specific speaker speech recognition device that can recognize the speech of non-specific speakers.
In the case of a specific speaker speech recognition device, a specific speaker first registers his or her speech signal patterns as reference templates by entering recognizable words or phrases one at a time according to a specified interactive procedure. After this registration, when the speaker issues one of the registered words, speech recognition is performed by comparing the feature pattern of the entered word to the registered speech templates. One example of this kind of interactive speech recognition device is a speech recognition toy. The child who uses the toy pre-registers about 10 phrases such as "Good morning," "Good night" and "Good day," for example, as multiple speech instructions. In practice, when the speaker says "Good morning," his speech signal is compared to the speech signal of the registered "Good morning." If there is a match between the two speech signals, an electrical signal corresponding to the speech instruction is generated, which then makes the toy perform a specified action.
As the name implies, of course, this type of specific speaker speech recognition device can recognize only the speech of a specific speaker or speech possessing a highly similar pattern. Furthermore, since the phrases to be recognized must be registered one at a time as part of device initialization, the procedure is quite daunting and cumbersome.
By contrast, a non-specific speaker speech recognition device creates feature patterns data of the recognition target phrases described above, using the speech issued by a large number (e.g., around 200) of speakers, and stores (registers) this data in advance. Speech issued by a non-specific speaker is then compared to these pre-registered recognizable phrases for recognition and is particularly concerned with voice-based activation of such instruments.
Although such non-specific speech recognition devices can achieve relatively high recognition rates for "typical" voices, they cannot always achieve high recognition rates for all types of voices speech features and linguistic variations within a given language. For example, the voice characteristics vary widely depending on the age and sex of the speaker, such as a toddler, an adult, a woman, and a man. In some cases, a speech recognition device may be able to achieve extremely high recognition rates for adults' voices but may fail miserably with toddlers' voices.
Furthermore, this type of speech recognition device may also be used in automatic vending machines. For example, if such a device is used in an automatic ticketing machine, it becomes possible to buy tickets at railway stations, various facilities, restaurants, etc. by simply entering voice commands. If such a system could be implemented, cumbersome operations such as having to check fee tables for correct amounts and pressing the correct buttons would be eliminated, enabling senior citizens, children, or physically handicapped people to buy tickets quite with relative ease.
However, many problems do stand in the way of commercial implementation. That is, this type of speech recognition device must be capable of recognizing the voice of a non-specific speaker at extremely high recognition rates, regardless of the differences in the characteristic due to the speaker's age, sex, or individual speech mannerisms.