1. Field of the Invention
The invention generally relates to interactive speech recognition instruments which recognize speech and produce an audible response or specified action based on developed recognition results, and is particularly concerned with voice-based activation of such instruments.
2. Description of the Related Art
Speech recognition devices can be generally classified into two types. The first type is the specific-speaker speech recognition device that can only recognize the speech of a specific speaker, and the second general type is the non-specific speaker speech recognition device that can recognize the speech of non-specific speakers.
In the case of a specific speaker speech recognition device a specific speaker first registers his or her speech signal patterns as reference templates by entering recognizable words one at a time according to an interactive specified interactive procedure. After registration, when the speaker issues one of the registered words. speech recognition is performed by comparing the feature pattern of the entered word to the registered speech templates. One example of this kind of interactive speech recognition device is a speech recognition toy. The child who uses the toy pre-registers, for example, about 10 phrases such as "Good morning," "Good night" and "Good day,", as multiple speech instructions. In practice, when the speaker says "Good morning," his speech signal is compared to the speech signal of the registered "Good morning." If there is a match between the two speech signals, a electrical signal corresponding to the speech instruction is generated, which then makes the toy perform a specified action.
As the name implies, of course, this type of specific speaker speech recognition devices can recognize only the speech of a specific speaker or speech possessing a similar pattern. Furthermore, since the phrases to be recognized must be registered one at a time as part of device initialization, using such a device is quite cumbersome and complex.
By contrast, a non-specific speaker speech recognition device creates standard speech feature patterns of the recognition target phrases described above, using "canned" speech examplars spoken by a large number (e.g., around 200) of speakers. Phrases spoken by a non-specific speaker/user and then compared to these pre-registered recognizable phrases for recognition.
However, such speech recognition devices usually become ready to perform recognition operations and responses only when an external switch is turned on or external power is delivered to the device is turned on, regardless of whether the device uses specific or non-specific speaker recognition. But, in some types of speech recognition devices, it would be more convenient if the device were in a standby state waiting for speech input at all times, and performed recognition operations by sensing speech input, without the need for the user to turn on the switch every time.
Take a stuffed toy utilizing speech recognition, for example. If the toy can be kept in a speech input standby state, i,e., a so-called sleep mode, and can instantly respond when the child calls out its name, it can respond quickly without the need for plugging the device in or pressing, a button, thereby greatly enhancing its appeal as a user-friendly device especially for younger children where applying external power may raise safety concerns. In addition to toys, the same can be said of all electronic instruments that utilize speech recognition.
Some issues must be resolved when keeping the device in a sleep mode and having it perform recognition operations by sensing speech input, as explained above. These include, for example, power consumption and the ability of the device to differentiate between phrases to be recognized and noise, and to act only in response to phrases to be recognized. In particular, since most toys run on batteries, minimizing battery drain is a major issue. Additionally, product prices must also be kept low to maintain commercial appeal for such devices , so using expensive, conventional activation circuitry is undesirable. So, heretofore, there have been a large number of technical restrictions on commercializing interactive speech devices which also feature voice activation.