1. Field of the Invention
The present invention relates to an image processing apparatus in which an image reading apparatus that reads an image of an original is included, a speech recognition processing apparatus, a control method for the speech recognition processing apparatus, and a non-transitory computer-readable storage medium.
2. Description of the Related Art
In recent years, image processing apparatuses have been provided with a function for recognizing speech (speech recognition function).
For example, in the case of transmitting image data generated by using a scanner function or the like included in an image processing apparatus to another image processing apparatus, personal computer, or the like, a user can search for a destination (address) for the image data by voice.
Speech recognition requires a recognition dictionary table in which phrases for recognizing speech are registered. Normally, phrases are registered in the recognition dictionary table in units of words, and therefore speech can only be recognized in units of words. Recognizing speech including multiple words therefore requires registering such speech in a grammar, which is a dictionary table for recognizing multiple words as a single phrase.
For example, assume that in order to search by voice for the fax number of a person named “SUZUKI” who is already registered by a user, the user has consecutively said the two words “fax” and “suzuki”, as in “fax_suzuki”. In this case, if the phrase “fax_suzuki” has been registered, as with a conventional recognition dictionary table TB10 shown in FIG. 33, the user's speech is recognized.
However, it is not always true that the user will always say the same combination of phrases in the same specific order. In the above example, the user may switch the order of the words and say “suzuki_fax”.
The image processing apparatus can only recognize speech formed by the phrases registered in the recognition dictionary table. For this reason, if the user has said “suzuki_fax”, which has a different order from the phrase “fax_suzuki” that is registered in the recognition dictionary table, the user's speech will not be recognized.
Accordingly, there are cases where it is impossible to recognize speech that is formed by multiple phrases said in different orders while having the same meaning, which may cause confusion for the user. There is also a reduction in the user-friendliness of the image processing apparatus.
Conventionally, a method has been proposed in JP 2002-108389A (hereinafter, referred to as “Patent Document 1”) in which a surname dictionary and a first-name dictionary for speech recognition are created, and in the case of searching for an individual's name by voice, speech recognition is executed separately for the surname and first name that have been input in order by voice.
However, even with the method disclosed in Patent Document 1, unless the surname and first name are input by voice in the correct order, it is impossible to correctly perform speech recognition for the input surname and first name. In other words, the method disclosed in Patent Document 1 does not enable recognizing speech formed by multiple words said in different orders while having the same meaning, and the user may feel inconvenienced.