A speech uttered by a speaker is conventionally used for controlling various functions of a camera or a car navigation device. For instance, in JP-A-S64-56428, a camera control system using voice input is described as follows: a speech corresponding to required manipulation is inputted; the speech is recognized by a voice recognition unit; and the camera is controlled based on a control processing corresponding to a recognition result.
In this voice-controlled camera, a certain function can be executed by a certain voice command having one-to-one correspondence with the certain function. For instance, only “no strobe” can be functional as the certain voice command for prohibiting a strobe light at shooting, even though “strobe off,” “stop strobe,” or “flash off” may be used depending on a user.
A user therefore needs to correctly memorize a certain voice command that enables a certain function to be executed. However, user's workload increases with increasing executable functions. This results in worsening usability in voice input.
In voice recognition, a shorter word is apt to be mis-recognized. For instance, there is a case where a user inputs an address of a destination through voice input in a car navigation device and is then required for determining whether a point designated on a map is correct as a destination. In this case, the user sets or cancels the destination by uttering “YES” or “NO,” respectively. However, the short word of “YES” or “NO” is apt to be mis-recognized, so that a function of setting the destination is sometimes executed against user's intention.