As a conventional technique, a user's utterance is recognized and used for operating a device. If the device (as an operation target) outputs a sound (a broadcasted speech, an artificial speech, and so on), this sound is a noise for recognizing the user's speech. Furthermore, from an input signal mixing the sound (outputted by the device) with a speech uttered by a speaker (user), by using an echo canceller to cancel the sound, a technique to improve an accuracy of speech recognition is proposed. However, in this case, computing processing for the echo canceller is necessary. Accordingly, as to a device having restricted throughput, this technique is difficult to be realized.
On the other hand, a device to mute the sound during recognizing a user's speech is utilized. As to this device, while the user's speech is being recognized, the sound does not exist. Accordingly, the user's speech is recognized without influence of the sound. However, if the device (as the operation target) is a television set, the user (viewer) cannot listen to the sound (speech) broadcasted from the television set during recognizing the speech.