1. Field of the Invention
The present invention relates to an image processing apparatus which includes a speech recognition function.
2. Description of the Related Art
As devices have become more sophisticated in recent years, the operation procedures of devices have also become more complex. Consequently, there are strong demands for a user interface (UI) that can be operated intuitively by a user who is not used to operating a device.
An example of a UI that can be operated intuitively is speech operation which uses speech recognition. In speech operation, a user speaks a speech command on a microphone of a device, and the device performs a process according to the command. Speech operation has been put into practical use in, for example, car navigation systems and computer telephony integration (CTI) systems.
The vocabulary which a user can speak in most of the speech operation that have been put into practical use includes commands that are originally embedded in the device. However, there is a technique in which a user can register speech onto a specific function or an object, such as an application, a UI layer, or a Uniform Resource Locator (URL). The user can then call the function or the object by speaking the same content as the registered speech (for example, refer to Japanese Patent Application Laid-Open No. 11-184670). Hereinafter, such a technique will be referred to as a “speech bookmark”. Moreover, a function or an object to which a speech bookmark can be assigned will be referred to as a “speech bookmark target”. Furthermore, the act of assigning speech to a specific function or object is referred to as “speech registration”, and the act of calling such a function or an object by speech will be referred to as a “speech calling”.
A user who cannot memorize the embedded speech command can use a speech bookmark to register speech that the user can easily remember. Additionally, since a user can register speech by his or her voice, the speech recognition performance increases. This is especially helpful for a user whose voice has a specific feature which is difficult to recognize by using an embedded speech recognition function.
A case where a speech bookmark function is applied to a command sequence is described below. For example, a user registers “my macro 1” by speech onto the following series of command sequence:
1. assign “10” to a variable A
2. assign “20” to a variable B
3. assign “30” to a variable C
Consequently, the command sequence can be executed by only saying (speech calling) “my macro 1”, thus improving user-friendliness.
However, a problem arises in that it is difficult to determine whether a command sequence that is actually called is the command sequence the user intended to call.
For example, the user registers “my macro 2” by speech onto the following command sequence, which is slightly different from the previous command sequence:
1. assign “10” to a variable A
2. assign “40” to a variable B
3. assign “30” to a variable C
Generally, misrecognition may be generated in speech recognition (speech calling) such that even if a user calls “my macro 1”, the device can interpret the called speech as “my macro 2”. In such a case, a series of command sequence corresponding to “my macro 2” is executed. However, the user may not realize that “my macro 2” has been executed instead of “my macro 1”.
To solve such a problem, when the user calls a speech bookmark, it is desirable to output to a user identification information, such as an icon, defined for each speech bookmark target. In the above example, if the device recognizes the called speech as “my macro 1”, an icon representing “my macro 1” is displayed. If a user calls “my macro 1”, and an icon other than that of “my macro 1” is displayed, the user can immediately realize that the called speech is misrecognized.
There are two methods of registering an icon to the speech bookmark target by using conventional techniques:
1. Embedding icon information into the speech bookmark target.
2. Allowing the user to edit an icon, or to select an icon from an icon list in the system, when registering speech.
The first method is used in the bookmark function of a web browser. In a Hyper Text Markup Language (HTML) document on the Web, an icon displayed with a bookmark can be described as follows:
<link rel=“shortcut icon”href=“shortcut.ico”/>
When a user displays a content which includes the above tag and makes a bookmark registration, the web browser allocates the icon “shortcut.ico” to the bookmark.
If the above method is applied to the speech bookmark, icon information is embedded into the speech bookmark target. Consequently, when the speech bookmark target is called, the icon is displayed so that the speech bookmark target becomes more identifiable from other bookmark targets. However, it is unrealistic to allocate a different icon to each of the speech bookmark targets beforehand. For example, when a macro as described above is a speech bookmark target, a different icon has to be allocated to each of possible combinations of set values.
In the second method, when a user registers a speech bookmark, an icon editing screen is displayed using a graphical user interface (GUI), on which the user can edit or select a identifiable icon. The speech bookmark target becomes more identifiable if the user selects an appropriate icon. However, a user is required to perform an editing or selecting operation when registering a speech bookmark, which can be burdensome.
Therefore, it is difficult to allocate easily-identifiable identification information to a speech bookmark target without burdening a user.