1. Field of the Invention
The present invention relates to an audio input device which calculates the degree of coincidence between vocalizations of a user and objective recognition terms stored in a voice dictionary of the audio input device, by means of speech recognition and which further extracts the objective recognition terms having the highest degree of coincidence from the voice dictionary to output a command to operate one of installations, the installation corresponding to the extracted objective terms.
2. Description of the Related Art
As an earlier technology, an audio input device as shown in FIG. 1 is proposed. This audio input device is installed in an automobile.
The audio input device has a signal-processing unit 5 including a CPU (central processing unit) 1 and a memory 3. An external memory 7 having a voice dictionary 7a is connected with the signal-processing unit 5.
As a pickup unit for picking up the user's vocalizations, a microphone 11 is connected to the signal-processing unit 5 through an A/D (analog/digital) convener 35. Similarly, an amplifier 15 for driving a speaker 17 is connected to the signal-processing unit 5 through a D/A (digital/analog) converter 13. Further, a dialogue start switch 19 and a monitor display unit 21 are together connected to the signal-processing unit 5. In operation, corresponding to a user's voice inputted through the microphone 11, the signal processing unit 5 outputs a command signal S to any one of installations 23, for example, radio, CD (compact disc) player, air conditioner, etc.
FIG. 2 shows the formation of the voice dictionary 7a stored in the external memory 7. In this voice dictionary 7a there are stored objective recognition terms which are used in ease of driving the radio, the CD player and the air conditioner by the user's voice. The external memory 7 is adapted so as to be able to accept the user's various vocalizations represented wit numerals 1101 to 1109 in the figure. Note, in FIG. 2, each alphabet X represents a definite number. For example, if the recognition terms corresponding to the numeral 1106 are extracted as a result of the user's vocalizing of “Cee Dee number ten!”, then it is concluded that the speech recognition has been achieved successfully.
The above-mentioned audio input device operates as follows. FIG. 3 is a flow chart for explanation of the operation of the device.
First, at step 1200, the signal-processing unit 5 reads the voice dictionary 7a of the external memory 7 once the audio input device is powered on. At step 1201, it is judged whether or not the dialogue start switch 19 is operated by the user. If the judgement at step 1201 is Yes, then the routine goes to step 1202 where the sampling of vocalizations is initiated into the signal processing unit 5 corresponding to the user's vocalizing of the terms stored in the voice dictionary 7a. 
For instance, if the user vocalizes “Cee Dee number ten!”, the signal-processing unit 5 detects the user's vocalizations in the following maimer.
The sound data inputted through the microphone 11 is convened into digital signals by the A/D convener 9. Until the dialogue start switch 19 is operated by the user, the signal-processing unit 5 has calculated an average of the above digital signals with respect to their power (intensity of signals). On condition that the dialogue start switch 19 has been operated, when the instantaneous power of the specified digital signal is larger than the calculated “power” average of the digital signals by a predetermined value, then the signal processing unit 5 judges that the user has vocalized and further starts in reading the user's vocalizations.
Return to FIG. 3, at step 1203, it is executed to calculate the degree of coincidence between the group of recognition tens in the vocal dictionary 7a loaded into the memory 3 and the user's vocalizations, by the signal processing unit 5. The calculation of the degree of coincidence is carried out by HMM method as the mainstream of audio recognition algorithm in recent years. Not; even when calculating the degree of coincidence, the above-mentioned operation to read the user's vocalizations is maintained by the parallel processing of the unit 5. At step 1204, it is judged whether or not the user's vocalizing has been finished. For example, when the instantaneous power of the digital signal becomes less than a designated value for a predetermined period, then it is judged that the user's vocalizing has been finished and the routine goes to step 1205 where the operation to read the user's vocalizations is ended. While, if the judgement at step 1204 is No, the routine goes bank to step 1203 to maintain the above calculation.
At next step 1206, the signal-processing unit 5 selects the recognition term whose degree of coincidence is the highest. In the above instance, the recognition term “ten” following the term “Cee Dee”, which corresponds to the alphabet X at the numeral 1106, has the highest degree of coincidence. Thereafter, at step 1207, the signal-processing unit 5 informs the user of a fact that the recognition term having a high degree of coincidence is now recognized, by means of a phonetic sound. (feedback of recognition results) In detail, at the same step, it is executed to synthesize an audio message of “Now, playing the tenth number of CD player!” from sound data stored in the external memory 7 and further executed to generate the same message to the user through the amplifier 15 and the speaker 17.
Next, at step 1208, the signal processing unit 5 judges whether or not a not-shown “decision” switch was pushed for a predetermined period. If the user's manipulation of the decision switch is detected (Yes), then the routine goes to step 1209 to output a command to operate the installation 23 corresponding to the objective recognition terms recognized in this routine. In this case, there is generated a command to allow the CD player to play the tenth music on the present compact disc installed therein the CD player. While, if the judgment at step 1208 is “No”, that is, no detection of the user's manipulation of the decision switch, then the routine goes back to step 1201.